Array Processor Element Interconnection Patents (Class 712/11)

Cube or hypercube (Class 712/12)

Partitioning (Class 712/13)

Processing element memory (Class 712/14)

Reconfiguring (Class 712/15)

Implementing neural networks in fixed point arithmetic computing systems

Patent number: 11868864

Abstract: Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems. In one aspect, a method includes the actions of receiving a request to process a neural network using a processing system that performs neural network computations using fixed point arithmetic; for each node of each layer of the neural network, determining a respective scaling value for the node from the respective set of floating point weight values for the node; and converting each floating point weight value of the node into a corresponding fixed point weight value using the respective scaling value for the node to generate a set of fixed point weight values for the node; and providing the sets of fixed point floating point weight values for the nodes to the processing system for use in processing inputs using the neural network.

Type: Grant

Filed: March 26, 2020

Date of Patent: January 9, 2024

Assignee: Google LLC

Inventor: William John Gulland
Host apparatus, heterogeneous system architecture device, and heterogeneous system based on unified virtual memory

Patent number: 11782835

Abstract: Disclosed herein is a heterogeneous system based on unified virtual memory. The heterogeneous system based on unified virtual memory may include a host for compiling a kernel program, which is source code of a user application, in a binary form and delivering the compiled kernel program to a heterogenous system architecture device, the heterogenous system architecture device for processing operation of the kernel program delivered from the host in parallel using two or more different types of processing elements, and unified virtual memory shared between the host and the heterogenous system architecture device.

Type: Grant

Filed: November 29, 2021

Date of Patent: October 10, 2023

Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Joo-Hyun Lee, Young-Su Kwon, Jin-Ho Han
Control wavelet for accelerated deep learning

Patent number: 11727254

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.

Type: Grant

Filed: August 27, 2020

Date of Patent: August 15, 2023

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Gary R. Lauterbach, Michael Edwin James, Michael Morrison, Srikanth Arekapudi
Cross replica reduction on networks having degraded nodes

Patent number: 11715010

Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.

Type: Grant

Filed: August 16, 2019

Date of Patent: August 1, 2023

Assignee: Google LLC

Inventors: Bjarke Hammersholt Roune, Sameer Kumar, Norman Paul Jouppi
Neural network processing with chained instructions

Patent number: 11663450

Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.

Type: Grant

Filed: June 29, 2017

Date of Patent: May 30, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger
Dedicated hardware system for solving partial differential equations

Patent number: 11656849

Abstract: Embodiments relate to a computing system for solving differential equations. The system is configured to receive problem packages corresponding to problems to be solved, each comprising at least a differential equation and a domain, and to select a solver of a plurality of solvers, based upon availability of each of the plurality of solvers. Each solver comprises a coordinator that partitions the domain of the problem into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a differential equation accelerator (DEA) of a plurality of DEAs. Each DEA comprises at least two memory units, and processes the sub-domain data over a plurality of time-steps by passing the sub-domain data through a selected systolic array from one memory unit, and storing the processed sub-domain data in the other memory unit, and vice versa.

Type: Grant

Filed: August 10, 2020

Date of Patent: May 23, 2023

Assignee: Vorticity Inc.

Inventor: Chirath Neranjena Thouppuarachchi
Data exchange pathways between pairs of processing units in columns in a computer

Patent number: 11561926

Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets. The computer comprises: a plurality of processing units each having an input interface with a set of input wires, and an output interface with a set of output wires: a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by its associated processing unit; the processing units arranged in columns, each column having a base processing unit proximate the switching fabric and multiple processing units one adjacent the other in respective positions in the direction of the column.

Type: Grant

Filed: January 20, 2022

Date of Patent: January 24, 2023

Assignee: GRAPHCORE LIMITED

Inventors: Stephen Felix, Simon Christian Knowles
Individual data unit and methods and systems for enhancing the security of user data

Patent number: 11550950

Abstract: An individual data unit for enhancing the security of a user data record is provided that includes a processor and a memory configured to store data. The individual data unit is associated with a network and the memory is in communication with the processor. The memory has instructions stored thereon which, when read and executed by the processor cause the individual data unit to perform basic operations only. The basic operations include communicating securely with computing devices, computer systems, and a central user data server. Moreover, the basic operations include receiving a user data record, storing the user data record, retrieving the user data record, and transmitting the user data record. The individual data unit can be located in a geographic location associated with the user which can be different than the geographic locations of the computer systems and the central user data server.

Type: Grant

Filed: January 23, 2021

Date of Patent: January 10, 2023

Inventor: Richard Jay Langley
Providing efficiencies in processing and communicating internet protocol packets in a network using segment routing

Patent number: 11438445

Abstract: In one embodiment, a Segment Routing network node provides efficiencies in processing and communicating Internet Protocol packets in a network. This Segment Routing node typically advertises (e.g., using Border Gateway Protocol) its Segment Routing processing capabilities, such as Penultimate Segment Pop (PSP) and/or Ultimate Segment Pop (USP) of a Segment Routing Header (including in the context of a packet that has multiple Segment Routing Headers). Subsequently, an Internet Protocol Segment Routing packet having multiple Segment Routing Headers is received. The packet is processed according to a Segment Routing function, with is processing including removing a first one of the Segment Routing Headers and forwarding the resultant Segment Routing packet. The value of the Segments Left field in the first Segment Routing Header identifies to perform PSP when the value is one, to perform USP when the value is zero, or to perform other processing.

Type: Grant

Filed: May 12, 2020

Date of Patent: September 6, 2022

Assignee: Cisco Technology, Inc.

Inventors: Ahmed Refaat Bashandy, Syed Kamran Raza, Jisu Bhattacharya, Clarence Filsfils
Systems and methods for implementing chained tile operations

Patent number: 11416260

Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.

Type: Grant

Filed: April 30, 2020

Date of Patent: August 16, 2022

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
Data processing engine array architecture with memory tiles

Patent number: 11336287

Abstract: An integrated circuit can include a data processing engine (DPE) array having a plurality of tiles. The plurality of tiles can include a plurality of DPE tiles, wherein each DPE tile includes a stream switch, a core configured to perform operations, and a memory module. The plurality of tiles can include a plurality of memory tiles, wherein each memory tile includes a stream switch, a direct memory access (DMA) engine, and a random-access memory. The DMA engine of each memory tile may be configured to access the random-access memory within the same memory tile and the random-access memory of at least one other memory tile. Selected ones of the plurality of DPE tiles may be configured to access selected ones of the plurality of memory tiles via the stream switches.

Type: Grant

Filed: March 9, 2021

Date of Patent: May 17, 2022

Assignee: Xilinx, Inc.

Inventors: Javier Cabezas Rodriguez, Juan J. Noguera Serra, David Clarke, Sneha Bhalchandra Date, Tim Tuan, Peter McColgan, Jan Langer, Baris Ozgul
Processor instructions to accelerate FEC encoding and decoding

Patent number: 11327753

Abstract: Various embodiments are described of a system for improved processor instructions for a software-configurable processing element. In particular, various embodiments are described which accelerate functions useful for FEC encoding and decoding. In particular, the processing element may be configured to implement one or more instances of the relevant functions in response to receiving one of the processor instructions. The processing element may later be reconfigured to implement a different function in response to receiving a different one of the processor instructions. Each of the disclosed processor instructions may be implemented repeatedly by the processing element to repeatedly perform one or more instances of the relevant functions with a throughput approaching one or more solutions per clock cycle.

Type: Grant

Filed: June 22, 2020

Date of Patent: May 10, 2022

Assignee: Coherent Logix, Incorporated

Inventors: Keith M. Bindloss, Carl S. Dobbs, Evgeny Mezhibovsky, Zahir Raza, Kevin A. Shelby
Reconfigurable reduced instruction set computer processor architecture with fractured cores

Patent number: 11294851

Abstract: Systems and methods for reconfiguring a reduced instruction set computer processor architecture are disclosed. Exemplary implementations may: provide a primary processing core consisting of a RISC processor; provide a node wrapper associated with each of the plurality of secondary cores, the node wrapper comprising access memory associates with each secondary core, and a load/unload matrix associated with each secondary core; operate the architecture in a manner in which, for at least one core, data is read from and written to the at least cache memory in a control-centric mode; the secondary cores are selectively partitioned to operate in a streaming mode wherein data streams out of the corresponding secondary core into the main memory and other ones of the plurality of secondary cores.

Type: Grant

Filed: May 4, 2018

Date of Patent: April 5, 2022

Assignee: Cornami, Inc.

Inventors: Paul L. Master, Frederick Furtek, Martin Alan Franz, II, Raymond J. Andraka PE
Data processing engine array architecture with memory tiles

Patent number: 11296707

Abstract: An integrated circuit can include a data processing engine (DPE) array having a plurality of tiles. The plurality of tiles can include a plurality of DPE tiles, wherein each DPE tile includes a stream switch, a core configured to perform operations, and a memory module. The plurality of tiles can include a plurality of memory tiles, wherein each memory tile includes a stream switch, a direct memory access (DMA) engine, and a random-access memory. The DMA engine of each memory tile may be configured to access the random-access memory within the same memory tile and the random-access memory of at least one other memory tile. Selected ones of the plurality of DPE tiles may be configured to access selected ones of the plurality of memory tiles via the stream switches.

Type: Grant

Filed: March 9, 2021

Date of Patent: April 5, 2022

Assignee: Xilinx, Inc.

Inventors: Javier Cabezas Rodriguez, Juan J. Noguera Serra, David Clarke, Sneha Bhalchandra Date, Tim Tuan, Peter McColgan, Jan Langer, Baris Ozgul
Polar code-based interleaving method and communication apparatus

Patent number: 11277154

Abstract: A polar code-based interleaving method and apparatus, to resolve a problem existing in the prior art that when a code length is relatively long, an implementation process of reading a sequence obtained after random interleaving is relatively complex, is provided. The method includes: determining an interleaving matrix based on a target code length M of a polar code; and interleaving, based on the interleaving matrix, encoded bits obtained after encoding of the polar code, to generate interleaved bits.

Type: Grant

Filed: December 5, 2019

Date of Patent: March 15, 2022

Assignee: Huawei Technologies Co., Ltd.

Inventors: Guijie Wang, Rong Li, Jun Wang, Gongzheng Zhang, Huazi Zhang
Signal pathways in multi-tile processors

Patent number: 11269805

Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.

Type: Grant

Filed: May 15, 2018

Date of Patent: March 8, 2022

Assignee: Intel Corporation

Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
Task execution in a SIMD processing unit with parallel groups of processing lanes

Patent number: 11189004

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Grant

Filed: May 6, 2020

Date of Patent: November 30, 2021

Assignee: Imagination Technologies Limited

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Apparatuses and methods for memory alignment

Patent number: 11048428

Abstract: The present disclosure includes apparatuses and methods related to memory alignment. An example method comprises performing an alignment operation on a first byte-based memory element and a second byte-based memory element such that a padding bit of the first byte-based memory element is logically adjacent to a padding bit of the second byte-based memory element and a data bit of the first byte-based memory element is logically adjacent to a data bit of the second byte-based memory element.

Type: Grant

Filed: September 18, 2019

Date of Patent: June 29, 2021

Assignee: Micron Technology, Inc.

Inventor: John D. Leidel
Policy-driven dynamic consensus protocol selection

Patent number: 11038690

Abstract: A method obtains one or more transactions to be validated by a set of consensus nodes before storage on a digital ledger, and then selects, from a plurality of consensus algorithms, a consensus algorithm to be executed by the set of consensus nodes on the one or more transactions. The consensus algorithm selection is made based on a given policy associated with the one or more transactions. The method then tags the one or more transactions to identify the selected consensus algorithm, and sends the one or more tagged transactions to the set of consensus nodes for execution of the selected consensus algorithm for validation of the one or more transactions before storage on the digital ledger. The selection step is repeated when one or more additional transactions are obtained.

Type: Grant

Filed: October 4, 2018

Date of Patent: June 15, 2021

Assignee: EMC IP Holding Company LLC

Inventor: Stephen J. Todd
Assigning identifiers to processing units in a column to repair a defective processing unit in the column

Patent number: 11036673

Abstract: A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.

Type: Grant

Filed: May 22, 2019

Date of Patent: June 15, 2021

Assignee: Graphcore Limited

Inventors: Stephen Felix, Jonathan Mangnall
Streaming interconnect architecture for data processing engine array

Patent number: 10990552

Abstract: Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include a memory module (with memory banks for storing data) and an interconnect which provides connectivity between the engines. To transmit processed data, a data processing engine identifies a destination processing engine in the array. Once identified, the data processing engine can transmit the processed data using a reserved point-to-point communication path in the interconnect that couples the source and destination data processing engines.

Type: Grant

Filed: April 3, 2018

Date of Patent: April 27, 2021

Assignee: XILINX, INC.

Inventors: Goran Hk Bilski, Peter McColgan, Juan J. Noguera Serra, Baris Ozgul, Jan Langer, Richard L. Walke, Ralph D. Wittig, Kornelis A. Vissers, Philip B. James-Roxby, Christopher H. Dick
Synchronization and exchange of data between processors

Patent number: 10963315

Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.

Type: Grant

Filed: February 15, 2019

Date of Patent: March 30, 2021

Assignee: Graphcore Limited

Inventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
Processor core design optimized for machine learning applications

Patent number: 10956361

Abstract: A computing system includes a plurality of functional units, each functional unit having one or more inputs and an output. There is a shared memory block coupled to the inputs and outputs of the plurality of functional units. There is a private memory block assigned to each of the plurality of functional units. An inter functional unit data bypass (IFUDB) block is coupled to the plurality of functional units. The IFUDB is configured to route signals between the one or more functional units without use of the shared memory block.

Type: Grant

Filed: November 29, 2018

Date of Patent: March 23, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Manoj Kumar, Pratap C. Pattnaik, Kattamuri Ekanadham, Jessica Tseng, Jose E. Moreira
Synchronization and exchange of data between processors

Patent number: 10949266

Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.

Type: Grant

Filed: August 13, 2019

Date of Patent: March 16, 2021

Assignee: GRAPHCORE LIMITED

Inventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
Individual data unit and methods and systems for enhancing the security of user data

Patent number: 10949566

Abstract: An individual data unit for enhancing the security of a user data record is provided that includes a processor and a memory configured to store data. The individual data unit is associated with a network and the memory is in communication with the processor. The memory has instructions stored thereon which, when read and executed by the processor cause the individual data unit to perform basic operations only. The basic operations include communicating securely with computing devices, computer systems, and a central user data server. Moreover, the basic operations include receiving a user data record, storing the user data record, retrieving the user data record, and transmitting the user data record. The individual data unit can be located in a geographic location associated with the user which can be different than the geographic locations of the computer systems and the central user data server.

Type: Grant

Filed: November 16, 2019

Date of Patent: March 16, 2021

Inventor: Richard Jay Langley
Intelligent data storage and processing using FPGA devices

Patent number: 10929152

Abstract: A system is disclosed that comprises a field programmable gate array (FPGA), a network interface, and a plurality of hardware templates. The FPGA comprises configurable hardware logic, and the hardware templates define a plurality of different pipelined processing operations. The FPGA can be accessible over a network via the network interface for commanding the FPGA to load a hardware template from among the hardware templates onto the FPGA to thereby configure hardware logic on the FPGA to perform the pipelined processing operation defined by the loaded hardware template, and wherein the FPGA is configured to (1) receive streaming data and (2) process the streaming data through the configured hardware logic to perform the pipelined processing operation defined by the loaded hardware template on the streaming data.

Type: Grant

Filed: July 20, 2020

Date of Patent: February 23, 2021

Assignee: IP Reservoir, LLC

Inventors: Roger D. Chamberlain, Mark Allen Franklin, Ronald S. Indeck, Ron K. Cytron, Sharath R. Cholleti
Code conversion apparatus and method for improving performance in computer operations

Patent number: 10908899

Abstract: A code conversion apparatus includes a memory and a processor coupled to the memory. The memory is configured to store therein a first code including a first data definition of a plurality of arrays, a first operation for the plurality of arrays, and a second data definition of an array indicating a result of the first operation. The processor is configured to convert the first data definition and the second data definition included in the first code into a data definition of an array of structures. The processor is configured to convert the first operation included in the first code into a second operation for the array of structures. The processor is configured to generate a second code including a predetermined instruction to perform the second operation on different pieces of data of the plurality of arrays in parallel with one another.

Type: Grant

Filed: April 5, 2019

Date of Patent: February 2, 2021

Assignee: FUJITSU LIMITED

Inventor: Shigeru Kimura
Declarative transactional communications with a peripheral device via a low-power bus

Patent number: 10909048

Abstract: The disclosed techniques enable a software program to communicate with a peripheral device (e.g., a sensor), via a low-level communication protocol such as the I2C protocol, even though the software program does not include lower-level code configured to implement a sequence of operations defined for the low-level communication protocol. The techniques determine that the software program includes a high-level operation that instructs for communications to be conducted with the peripheral device. The high-level operation is associated with a separately stored configuration file that includes the lower-level code configured to implement the sequence of operations enabling the communications to be conducted with the peripheral device via the low-level communication protocol.

Type: Grant

Filed: September 30, 2019

Date of Patent: February 2, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Alessandro Domenico Scarpantoni, Mei Ling Wilson, Shyamal K. Varma, Ajay P. Barboza
Determining the net emissions of air pollutants

Patent number: 10830743

Abstract: A method for determining a source of pollutant emissions for a selected area includes: mapping a grid onto a representation of the selected area; collecting monitoring data for the selected data; from the monitoring data, assigning air pollution values and weather values to each cell in the grid using an interpolation method to estimate values for gap cells; re-sizing the grid to mitigate the influence of atmospheric turbulence on the air pollution values; and using weather factor separation to minimize the influence of weather from the air pollution values, resulting in air pollution values that reflect the net pollutant emissions for the selected area.

Type: Grant

Filed: May 4, 2017

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Long Sheng Bai, Liang Liu, Zhuo Liu, Jun Mei Qu, Wei Zhuang
VLIW interface device and method for controlling the same

Patent number: 10782974

Abstract: A VLIW (Very Long Instruction Word) interface device includes a memory configured to store instructions and data, and a processor configured to process the instructions and the data, wherein the processor includes an instruction fetcher configured to output an instruction fetch request to load the instruction from the memory, a decoder configured to decode the instruction loaded on the instruction fetcher, an arithmetic logic unit (ALU) configured to perform an operation function if the decoded instruction is an operation instruction, a memory interface scheduler configured to schedule the instruction fetch request or a data fetch request that is input from the arithmetic logic unit, and a memory operator configured to perform a memory access operation in accordance with the scheduled instruction fetch request or data fetch request.

Type: Grant

Filed: November 23, 2016

Date of Patent: September 22, 2020

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Young-chul Cho, Suk-jin Kim, Chul-soo Park, Dong-kwan Suh
Storage device and storage method

Patent number: 10691542

Abstract: According to an embodiment, a storage device includes a plurality of memory nodes and a control unit. Each of the memory nodes includes a storage unit including a plurality of storage areas having a predetermined size. The memory nodes are connected to each other in two or more different directions. The memory nodes constitute two or more groups each including two or more memory nodes. The control unit is configured to sequentially allocate data writing destinations in the storage units to the storage areas respectively included in the different groups.

Type: Grant

Filed: September 11, 2013

Date of Patent: June 23, 2020

Assignee: Toshiba Memory Corporation

Inventors: Yuki Sasaki, Takahiro Kurita, Atsuhiro Kinoshita
Sort operation in memory

Patent number: 10685699

Abstract: Examples of the present disclosure provide apparatuses and methods related to performing a sort operation in a memory. An example apparatus might include a a first group of memory cells coupled to a first sense line, a second group of memory cells coupled to a second sense line, and a controller configured to control sensing circuitry to sort a first element stored in the first group of memory cells and a second element stored in the second group of memory cells by performing an operation without transferring data via an input/output (I/O) line.

Type: Grant

Filed: December 3, 2018

Date of Patent: June 16, 2020

Assignee: Micron Technology, Inc.

Inventor: Kyle B. Wheeler
Task execution in a SIMD processing unit with parallel groups of processing lanes

Patent number: 10679319

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Grant

Filed: April 17, 2019

Date of Patent: June 9, 2020

Assignee: Imagination Technologies Limited

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Processing with compact arithmetic processing element

Patent number: 10656912

Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

Type: Grant

Filed: November 6, 2019

Date of Patent: May 19, 2020

Assignee: Singular Computing LLC

Inventor: Joseph Bates
Implementing neural networks in fixed point arithmetic computing systems

Patent number: 10650303

Abstract: Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems. In one aspect, a method includes the actions of receiving a request to process a neural network using a processing system that performs neural network computations using fixed point arithmetic; for each node of each layer of the neural network, determining a respective scaling value for the node from the respective set of floating point weight values for the node; and converting each floating point weight value of the node into a corresponding fixed point weight value using the respective scaling value for the node to generate a set of fixed point weight values for the node; and providing the sets of fixed point floating point weight values for the nodes to the processing system for use in processing inputs using the neural network.

Type: Grant

Filed: February 14, 2017

Date of Patent: May 12, 2020

Assignee: Google LLC

Inventor: William John Gulland
Information processing apparatus, information processing system and method of controlling information processing system

Patent number: 10609188

Abstract: An information processing apparatus includes a receiver to receive data-packets, the data-packets generated by dividing a message into division-data and storing, for each of the division-data, one of the plurality of division data into one of the plurality of data packets, wherein each of the data-packets also includes a data value indicating a quantity of the division-data and data indicating whether or not the data-packet includes final division data corresponding to an end of the message, a memory, and a processor to store the division-data that is contained in a packet of the data-packets that are received, in the memory, and suppress the final division-data from being stored in the memory until the quantity of the data-packets received by the receiver equates to the data value indicating the quantity of the division-data, in a case where the final division-data is received earlier than any one of the other division-data.

Type: Grant

Filed: April 3, 2018

Date of Patent: March 31, 2020

Assignee: FUJITSU LIMITED

Inventors: Shinya Hiramoto, Yuji Kondo, Yuichiro Ajima
Processors, methods, and systems with a configurable spatial accelerator

Patent number: 10558575

Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.

Type: Grant

Filed: December 30, 2016

Date of Patent: February 11, 2020

Assignee: Intel Corporation

Inventors: Kermin E. Fleming, Jr., Kent D. Glossop, Simon C. Steely, Jr., Jinjie Tang, Alan G. Gara
Efficient movement of virtual nodes during reconfiguration of computing cluster

Patent number: 10534652

Abstract: Given a current configuration of virtual node groups in a computing cluster and a new configuration indicating one or more changes to the virtual node groups, a cluster manager generates a reconfiguration plan to arrange virtual nodes into the desired virtual node groups of the new configuration while minimizing a number of virtual nodes to be moved between physical nodes in the computing cluster.

Type: Grant

Filed: June 29, 2017

Date of Patent: January 14, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Rajib Dugar, Ashish Manral, Ganesh Narayanan
Method and apparatus for determining status element total with sequentially coupled counting status circuits

Patent number: 10481965

Abstract: Counting status circuits are electrically coupled to corresponding status elements. The status elements selectably store a bit status of a bit line coupled to a memory array. The bit status can indicate one of at least pass and fail. The counting status circuits are electrically coupled to each other in a sequential order. Control logic causes processing of the counting status circuits in the sequential order to determine a total of the memory elements that store the bit status. The total number of memory elements that store the bit status indicate the number of error bits or non-error bits, which can help determine whether there are too many errors to be fixed by error correction codes.

Type: Grant

Filed: December 22, 2016

Date of Patent: November 19, 2019

Assignee: MACRONIX INTERNATIONAL CO., LTD.

Inventors: Yih-Shan Yang, Shou-Nan Hung, Chun-Hsiung Hung, Yao-Jen Kuo, Meng-Fan Chang
Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’)

Patent number: 10474625

Abstract: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.

Type: Grant

Filed: January 17, 2012

Date of Patent: November 12, 2019

Assignee: International Business Machines Corporation

Inventors: Michael E. Aho, John E. Attinella, Thomas M. Gooding, Michael B. Mundy
Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’)

Patent number: 10474626

Abstract: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.

Type: Grant

Filed: December 10, 2012

Date of Patent: November 12, 2019

Assignee: International Business Machines Corporation

Inventors: Michael E. Aho, John E. Attinella, Thomas M. Gooding, Michael B. Mundy
Processors and methods for privileged configuration in a spatial array

Patent number: 10445098

Abstract: Methods and apparatuses relating to privileged configuration in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller coupled to a first subset and a second, different subset of the plurality of processing elements, the first subset having an output coupled to an input of the second, different subset, wherein the configuration controller is to configure the interconnect network between the first subset and the second, different subset of the plurality of processing elements to not allow communication on the interconnect network between the first subset and the second, different subset when a privilege bit is set to a first value and to allow communication on the interconnect network between the first subset and the second, different subset of the plurality of processing elements when the privilege bit is set to a second value.

Type: Grant

Filed: September 30, 2017

Date of Patent: October 15, 2019

Assignee: Intel Corporation

Inventors: Kermin E. Fleming, Simon C. Steely, Kent D. Glossop
Interconnection system

Patent number: 10417159

Abstract: An interconnection system, apparatus and method is described for arranging elements in a network, which may be a data memory system, computing system or communications system where the data paths are arranged and operated so as to control the power consumption and data skew properties of the system. A configurable switching element may be used to form the interconnections at nodes, where a control signal and other information is used to manage the power status of other aspects of the configurable switching element. Time delay skew of data being transmitted between nodes of the network may be altered by exchanging the logical and physical line assignments of the data at one or more nodes of the network. A method of laying out an interconnecting motherboard is disclosed which reduces the complexity of the trace routing.

Type: Grant

Filed: April 17, 2006

Date of Patent: September 17, 2019

Assignee: VIOLIN SYSTEMS LLC

Inventor: Jon C. R. Bennett
Method for providing subapplications to an array of ALUs

Patent number: 10409765

Abstract: An array of ALUs and a controlling and controlling unit providing the array sequentially ordered subapplications, wherein an ALU signals completion of execution of a subapplication to the controlling unit, which then provides a next sequential subapplication to the requesting ALU.

Type: Grant

Filed: June 21, 2017

Date of Patent: September 10, 2019

Assignee: PACT XPP SCHWEIZ AG

Inventors: Martin Vorbach, Armin Nuckel
Distributed image stabilization

Patent number: 10341561

Abstract: In a distributed video encoding system, a video is encoded by splitting into video segments and encoding the segments using multiple encoders. Prior to segmenting the video for distributed video encoding, image stabilization is performed on the video. For each frame in the video, a corresponding transform operation is first computed based on an estimated camera movement. Next, the video is segmented into multiple video segments and the corresponding per-frame transform information for the multiple video segments. The video segments are then distributed to multiple processing nodes that perform the image stabilization of the corresponding video segment by applying the corresponding transform. The results from all the stabilized video segments are then stitched back together for further video encoding operation.

Type: Grant

Filed: September 11, 2015

Date of Patent: July 2, 2019

Assignee: Facebook, Inc.

Inventors: Amit Puntambekar, Michael Hamilton Coward
Neural network unit with output buffer feedback and masking capability

Patent number: 10282348

Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, an arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs and an output. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective output buffer word. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and the accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output.

Type: Grant

Filed: April 5, 2016

Date of Patent: May 7, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
Tri-configuration neural network unit

Patent number: 10275393

Abstract: A neural network unit configurable to first/second/third configurations has N narrow and N wide accumulators, multipliers and adders. Each multiplier performs a narrow/wide multiply on first and second narrow/wide inputs to generate a narrow/wide product. A first adder input receives a corresponding narrow/wide accumulator's output and third input receives a widened corresponding narrow multiplier's narrow product in the third configuration. In the first configuration, each narrow/wide adder performs a narrow/wide addition on the first and second inputs to generate a narrow/wide sum for storage into the corresponding narrow/wide accumulator. In the second configuration, each wide adder performs a wide addition on the first and a second input to generate a wide sum for storage into the corresponding wide accumulator. In the third configuration, each wide adder performs a wide addition on the first, second and third inputs to generate a wide sum for storage into the corresponding wide accumulator.

Type: Grant

Filed: April 5, 2016

Date of Patent: April 30, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Batching tuples

Patent number: 10268727

Abstract: A technique of batching tuples can include determining a plurality of key-attributes for a plurality of tuples, creating a batch tuple, and calculating a hash value for the batch tuple.

Type: Grant

Filed: March 29, 2013

Date of Patent: April 23, 2019

Assignee: Hewlett Packard Enterprise Development LP

Inventors: Matthias J. Sax, Maria Guadalupe Castellanos, Meichun Hsu
System-on-chip and method for exchanging data between computation nodes of such a system-on-chip

Patent number: 10229073

Abstract: A system including at least one computation node including a memory, a processor reading/writing data in a work area of the memory and a DMA controller including a receiver receiving data from outside and writing it in a sharing area of the memory or a transmitter reading data in said sharing area and transmitting it outside. A write and read request mechanism is provided in order to cause, upon request of the processor, a data transfer, by the DMA controller, between the sharing area and the work area. The DMA controller includes an additional transmitting/receiving device designed for exchanging data between outside and the work area, without passing through the sharing area.

Type: Grant

Filed: March 2, 2017

Date of Patent: March 12, 2019

Assignee: Commissariat à l'énergie atomique et aux énergies alternatives

Inventors: Thiago Raupp Da Rosa, Romain Lemaire, Fabien Clermidy
Memory expansion apparatus includes CPU-side protocol processor connected through parallel interface to memory-side protocol processor connected through serial link

Patent number: 10216655

Abstract: A memory interface apparatus is provided. The apparatus includes a central processing unit (CPU)-side protocol processor connected to a CPU through a parallel interface and a memory-side protocol processor connected to a memory through a parallel interface, and the CPU-side protocol processor and the memory-side protocol processor are connected through a serial link.

Type: Grant

Filed: June 27, 2016

Date of Patent: February 26, 2019

Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventors: Yong Seok Choi, Hyuk Je Kwon

1 2 3 4 5 … next