Array Processor Element Interconnection Patents (Class 712/11)
  • Patent number: 11868864
    Abstract: Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems. In one aspect, a method includes the actions of receiving a request to process a neural network using a processing system that performs neural network computations using fixed point arithmetic; for each node of each layer of the neural network, determining a respective scaling value for the node from the respective set of floating point weight values for the node; and converting each floating point weight value of the node into a corresponding fixed point weight value using the respective scaling value for the node to generate a set of fixed point weight values for the node; and providing the sets of fixed point floating point weight values for the nodes to the processing system for use in processing inputs using the neural network.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: January 9, 2024
    Assignee: Google LLC
    Inventor: William John Gulland
  • Patent number: 11782835
    Abstract: Disclosed herein is a heterogeneous system based on unified virtual memory. The heterogeneous system based on unified virtual memory may include a host for compiling a kernel program, which is source code of a user application, in a binary form and delivering the compiled kernel program to a heterogenous system architecture device, the heterogenous system architecture device for processing operation of the kernel program delivered from the host in parallel using two or more different types of processing elements, and unified virtual memory shared between the host and the heterogenous system architecture device.
    Type: Grant
    Filed: November 29, 2021
    Date of Patent: October 10, 2023
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Joo-Hyun Lee, Young-Su Kwon, Jin-Ho Han
  • Patent number: 11727254
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.
    Type: Grant
    Filed: August 27, 2020
    Date of Patent: August 15, 2023
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Gary R. Lauterbach, Michael Edwin James, Michael Morrison, Srikanth Arekapudi
  • Patent number: 11715010
    Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for a network having one or more degraded nodes. A method comprises training a respective replica of a machine learning model on each node of multiple nodes organized in an n-dimensional network topology, combining the respective individual gradient vectors in the nodes to generate a final gradient vector by performing operations comprising: designating each group of nodes along the dimension as either a forwarding group or a critical group, updating, for each receiving node, a respective individual gradient vector with an intermediate gradient vector, performing a reduction on each critical group of nodes along the dimension to generate a respective partial final gradient vector for the critical group, and updating, for each critical group of nodes, an individual gradient vector for a representative node with the respective partial final gradient vector.
    Type: Grant
    Filed: August 16, 2019
    Date of Patent: August 1, 2023
    Assignee: Google LLC
    Inventors: Bjarke Hammersholt Roune, Sameer Kumar, Norman Paul Jouppi
  • Patent number: 11663450
    Abstract: Hardware and methods for neural network processing are provided. A method in a hardware node including a pipeline having a matrix vector unit (MVU), a first multifunction unit connected to receive an input from the matrix vector unit, a second multifunction unit connected to receive an output from the first multifunction unit, and a third multifunction unit connected to receive an output from the second multifunction unit is provided. The method includes performing using the MVU a first type of instruction that can only be performed by the MVU to generate a first result. The method further includes performing a second type of instruction that can only be performed by one of the multifunction units and generating a second result and without storing the any of the two results in a global register, passing the second result to the second multifunction and the third multifunction unit.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: May 30, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jeremy Fowers, Eric S. Chung, Douglas C. Burger
  • Patent number: 11656849
    Abstract: Embodiments relate to a computing system for solving differential equations. The system is configured to receive problem packages corresponding to problems to be solved, each comprising at least a differential equation and a domain, and to select a solver of a plurality of solvers, based upon availability of each of the plurality of solvers. Each solver comprises a coordinator that partitions the domain of the problem into a plurality of sub-domains, and assigns each of the plurality of sub-domains to a differential equation accelerator (DEA) of a plurality of DEAs. Each DEA comprises at least two memory units, and processes the sub-domain data over a plurality of time-steps by passing the sub-domain data through a selected systolic array from one memory unit, and storing the processed sub-domain data in the other memory unit, and vice versa.
    Type: Grant
    Filed: August 10, 2020
    Date of Patent: May 23, 2023
    Assignee: Vorticity Inc.
    Inventor: Chirath Neranjena Thouppuarachchi
  • Patent number: 11561926
    Abstract: A time deterministic computer is architected so that exchange code compiled for one set of tiles, e.g., a column, can be reused on other sets. The computer comprises: a plurality of processing units each having an input interface with a set of input wires, and an output interface with a set of output wires: a switching fabric connected to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective set of output wires and connectable to each of the processing units by the respective input wires via switching circuitry controllable by its associated processing unit; the processing units arranged in columns, each column having a base processing unit proximate the switching fabric and multiple processing units one adjacent the other in respective positions in the direction of the column.
    Type: Grant
    Filed: January 20, 2022
    Date of Patent: January 24, 2023
    Assignee: GRAPHCORE LIMITED
    Inventors: Stephen Felix, Simon Christian Knowles
  • Patent number: 11550950
    Abstract: An individual data unit for enhancing the security of a user data record is provided that includes a processor and a memory configured to store data. The individual data unit is associated with a network and the memory is in communication with the processor. The memory has instructions stored thereon which, when read and executed by the processor cause the individual data unit to perform basic operations only. The basic operations include communicating securely with computing devices, computer systems, and a central user data server. Moreover, the basic operations include receiving a user data record, storing the user data record, retrieving the user data record, and transmitting the user data record. The individual data unit can be located in a geographic location associated with the user which can be different than the geographic locations of the computer systems and the central user data server.
    Type: Grant
    Filed: January 23, 2021
    Date of Patent: January 10, 2023
    Inventor: Richard Jay Langley
  • Patent number: 11438445
    Abstract: In one embodiment, a Segment Routing network node provides efficiencies in processing and communicating Internet Protocol packets in a network. This Segment Routing node typically advertises (e.g., using Border Gateway Protocol) its Segment Routing processing capabilities, such as Penultimate Segment Pop (PSP) and/or Ultimate Segment Pop (USP) of a Segment Routing Header (including in the context of a packet that has multiple Segment Routing Headers). Subsequently, an Internet Protocol Segment Routing packet having multiple Segment Routing Headers is received. The packet is processed according to a Segment Routing function, with is processing including removing a first one of the Segment Routing Headers and forwarding the resultant Segment Routing packet. The value of the Segments Left field in the first Segment Routing Header identifies to perform PSP when the value is one, to perform USP when the value is zero, or to perform other processing.
    Type: Grant
    Filed: May 12, 2020
    Date of Patent: September 6, 2022
    Assignee: Cisco Technology, Inc.
    Inventors: Ahmed Refaat Bashandy, Syed Kamran Raza, Jisu Bhattacharya, Clarence Filsfils
  • Patent number: 11416260
    Abstract: Disclosed embodiments relate to systems and methods for implementing chained tile operations. In one example, a processor includes fetch circuitry to fetch one or more instructions until a plurality of instructions has been fetched, each instruction to specify source and destination tile operands, decode circuitry to decode the fetched instructions, and execution circuitry, responsive to the decoded instructions, to: identify first and second decoded instructions belonging to a chain of instructions, dynamically select and configure a SIMD path comprising first and second processing engines (PE) to execute the first and second decoded instructions, and set aside the specified destination of the first decoded instruction, and instead route a result of the first decoded instruction from the first PE to be used by the second PE to perform the second decoded instruction.
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: August 16, 2022
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11336287
    Abstract: An integrated circuit can include a data processing engine (DPE) array having a plurality of tiles. The plurality of tiles can include a plurality of DPE tiles, wherein each DPE tile includes a stream switch, a core configured to perform operations, and a memory module. The plurality of tiles can include a plurality of memory tiles, wherein each memory tile includes a stream switch, a direct memory access (DMA) engine, and a random-access memory. The DMA engine of each memory tile may be configured to access the random-access memory within the same memory tile and the random-access memory of at least one other memory tile. Selected ones of the plurality of DPE tiles may be configured to access selected ones of the plurality of memory tiles via the stream switches.
    Type: Grant
    Filed: March 9, 2021
    Date of Patent: May 17, 2022
    Assignee: Xilinx, Inc.
    Inventors: Javier Cabezas Rodriguez, Juan J. Noguera Serra, David Clarke, Sneha Bhalchandra Date, Tim Tuan, Peter McColgan, Jan Langer, Baris Ozgul
  • Patent number: 11327753
    Abstract: Various embodiments are described of a system for improved processor instructions for a software-configurable processing element. In particular, various embodiments are described which accelerate functions useful for FEC encoding and decoding. In particular, the processing element may be configured to implement one or more instances of the relevant functions in response to receiving one of the processor instructions. The processing element may later be reconfigured to implement a different function in response to receiving a different one of the processor instructions. Each of the disclosed processor instructions may be implemented repeatedly by the processing element to repeatedly perform one or more instances of the relevant functions with a throughput approaching one or more solutions per clock cycle.
    Type: Grant
    Filed: June 22, 2020
    Date of Patent: May 10, 2022
    Assignee: Coherent Logix, Incorporated
    Inventors: Keith M. Bindloss, Carl S. Dobbs, Evgeny Mezhibovsky, Zahir Raza, Kevin A. Shelby
  • Patent number: 11294851
    Abstract: Systems and methods for reconfiguring a reduced instruction set computer processor architecture are disclosed. Exemplary implementations may: provide a primary processing core consisting of a RISC processor; provide a node wrapper associated with each of the plurality of secondary cores, the node wrapper comprising access memory associates with each secondary core, and a load/unload matrix associated with each secondary core; operate the architecture in a manner in which, for at least one core, data is read from and written to the at least cache memory in a control-centric mode; the secondary cores are selectively partitioned to operate in a streaming mode wherein data streams out of the corresponding secondary core into the main memory and other ones of the plurality of secondary cores.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: April 5, 2022
    Assignee: Cornami, Inc.
    Inventors: Paul L. Master, Frederick Furtek, Martin Alan Franz, II, Raymond J. Andraka PE
  • Patent number: 11296707
    Abstract: An integrated circuit can include a data processing engine (DPE) array having a plurality of tiles. The plurality of tiles can include a plurality of DPE tiles, wherein each DPE tile includes a stream switch, a core configured to perform operations, and a memory module. The plurality of tiles can include a plurality of memory tiles, wherein each memory tile includes a stream switch, a direct memory access (DMA) engine, and a random-access memory. The DMA engine of each memory tile may be configured to access the random-access memory within the same memory tile and the random-access memory of at least one other memory tile. Selected ones of the plurality of DPE tiles may be configured to access selected ones of the plurality of memory tiles via the stream switches.
    Type: Grant
    Filed: March 9, 2021
    Date of Patent: April 5, 2022
    Assignee: Xilinx, Inc.
    Inventors: Javier Cabezas Rodriguez, Juan J. Noguera Serra, David Clarke, Sneha Bhalchandra Date, Tim Tuan, Peter McColgan, Jan Langer, Baris Ozgul
  • Patent number: 11277154
    Abstract: A polar code-based interleaving method and apparatus, to resolve a problem existing in the prior art that when a code length is relatively long, an implementation process of reading a sequence obtained after random interleaving is relatively complex, is provided. The method includes: determining an interleaving matrix based on a target code length M of a polar code; and interleaving, based on the interleaving matrix, encoded bits obtained after encoding of the polar code, to generate interleaved bits.
    Type: Grant
    Filed: December 5, 2019
    Date of Patent: March 15, 2022
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Guijie Wang, Rong Li, Jun Wang, Gongzheng Zhang, Huazi Zhang
  • Patent number: 11269805
    Abstract: Embodiments herein may present a multi-tile processor including a plurality of processor tiles, and a plurality of interconnects selectively coupling the plurality of processor tiles to each other. A first processor tile may include a memory to store a bulletin board to hold a message, an execution unit, and an encapsulated software module. The encapsulated software module may select a second processor tile coupled with the first processor tile by an interconnect to be a part of a signal pathway. The second processor tile may be selected based on a selection criterion of the signal pathway and the message held in the bulletin board. The encapsulated software module may post and read a message at the bulletin board stored in the memory, or read a message from a bulletin board stored in a memory of the second processor tile. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: March 8, 2022
    Assignee: Intel Corporation
    Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
  • Patent number: 11189004
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Grant
    Filed: May 6, 2020
    Date of Patent: November 30, 2021
    Assignee: Imagination Technologies Limited
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Patent number: 11048428
    Abstract: The present disclosure includes apparatuses and methods related to memory alignment. An example method comprises performing an alignment operation on a first byte-based memory element and a second byte-based memory element such that a padding bit of the first byte-based memory element is logically adjacent to a padding bit of the second byte-based memory element and a data bit of the first byte-based memory element is logically adjacent to a data bit of the second byte-based memory element.
    Type: Grant
    Filed: September 18, 2019
    Date of Patent: June 29, 2021
    Assignee: Micron Technology, Inc.
    Inventor: John D. Leidel
  • Patent number: 11038690
    Abstract: A method obtains one or more transactions to be validated by a set of consensus nodes before storage on a digital ledger, and then selects, from a plurality of consensus algorithms, a consensus algorithm to be executed by the set of consensus nodes on the one or more transactions. The consensus algorithm selection is made based on a given policy associated with the one or more transactions. The method then tags the one or more transactions to identify the selected consensus algorithm, and sends the one or more tagged transactions to the set of consensus nodes for execution of the selected consensus algorithm for validation of the one or more transactions before storage on the digital ledger. The selection step is repeated when one or more additional transactions are obtained.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: June 15, 2021
    Assignee: EMC IP Holding Company LLC
    Inventor: Stephen J. Todd
  • Patent number: 11036673
    Abstract: A method of recording tile identifiers in each of a plurality of tiles of a multitile processor is described. Tiles are arranged in columns, each column having a plurality of processing circuits, each processing circuit comprising one or more tiles, wherein a base processing circuit in each column is connected to a set of processing circuit identifier wires. A base value is generated on each of the set of processing circuit identifier wires for the base processing circuit in each column. At the base processing circuit, the base value on the set of processing circuit identifier wires is read and incremented by one. The incremented value is propagated to a next processing circuit in the column, and at the next processing circuit a unique identifier is recorded by concatenating an identifier of the column and the incremented value.
    Type: Grant
    Filed: May 22, 2019
    Date of Patent: June 15, 2021
    Assignee: Graphcore Limited
    Inventors: Stephen Felix, Jonathan Mangnall
  • Patent number: 10990552
    Abstract: Examples herein describe techniques for communicating between data processing engines in an array of data processing engines. In one embodiment, the array is a 2D array where each of the DPEs includes one or more cores. In addition to the cores, the data processing engines can include a memory module (with memory banks for storing data) and an interconnect which provides connectivity between the engines. To transmit processed data, a data processing engine identifies a destination processing engine in the array. Once identified, the data processing engine can transmit the processed data using a reserved point-to-point communication path in the interconnect that couples the source and destination data processing engines.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: April 27, 2021
    Assignee: XILINX, INC.
    Inventors: Goran Hk Bilski, Peter McColgan, Juan J. Noguera Serra, Baris Ozgul, Jan Langer, Richard L. Walke, Ralph D. Wittig, Kornelis A. Vissers, Philip B. James-Roxby, Christopher H. Dick
  • Patent number: 10963315
    Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: March 30, 2021
    Assignee: Graphcore Limited
    Inventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 10956361
    Abstract: A computing system includes a plurality of functional units, each functional unit having one or more inputs and an output. There is a shared memory block coupled to the inputs and outputs of the plurality of functional units. There is a private memory block assigned to each of the plurality of functional units. An inter functional unit data bypass (IFUDB) block is coupled to the plurality of functional units. The IFUDB is configured to route signals between the one or more functional units without use of the shared memory block.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: March 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Manoj Kumar, Pratap C. Pattnaik, Kattamuri Ekanadham, Jessica Tseng, Jose E. Moreira
  • Patent number: 10949266
    Abstract: A system comprising: a first subsystem comprising one or more first processors, and a second subsystem comprising one or more second processors. The second subsystem is configured to process code over a series of steps delineated by barrier synchronizations, and in a current step, to send a descriptor to the first subsystem specifying a value of each of one or more parameters of each of one or more interactions that the second subsystem is programmed to perform with the first subsystem via an inter-processor interconnect in a subsequent step. The first subsystem is configured to execute a portion of code to perform one or more preparatory operations, based on the specified values of at least one of the one or more parameters of each interaction as specified by the descriptor, to prepare for said one or more interactions prior to the barrier synchronization leading into the subsequent phase.
    Type: Grant
    Filed: August 13, 2019
    Date of Patent: March 16, 2021
    Assignee: GRAPHCORE LIMITED
    Inventors: David Lacey, Daniel John Pelham Wilkinson, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 10949566
    Abstract: An individual data unit for enhancing the security of a user data record is provided that includes a processor and a memory configured to store data. The individual data unit is associated with a network and the memory is in communication with the processor. The memory has instructions stored thereon which, when read and executed by the processor cause the individual data unit to perform basic operations only. The basic operations include communicating securely with computing devices, computer systems, and a central user data server. Moreover, the basic operations include receiving a user data record, storing the user data record, retrieving the user data record, and transmitting the user data record. The individual data unit can be located in a geographic location associated with the user which can be different than the geographic locations of the computer systems and the central user data server.
    Type: Grant
    Filed: November 16, 2019
    Date of Patent: March 16, 2021
    Inventor: Richard Jay Langley
  • Patent number: 10929152
    Abstract: A system is disclosed that comprises a field programmable gate array (FPGA), a network interface, and a plurality of hardware templates. The FPGA comprises configurable hardware logic, and the hardware templates define a plurality of different pipelined processing operations. The FPGA can be accessible over a network via the network interface for commanding the FPGA to load a hardware template from among the hardware templates onto the FPGA to thereby configure hardware logic on the FPGA to perform the pipelined processing operation defined by the loaded hardware template, and wherein the FPGA is configured to (1) receive streaming data and (2) process the streaming data through the configured hardware logic to perform the pipelined processing operation defined by the loaded hardware template on the streaming data.
    Type: Grant
    Filed: July 20, 2020
    Date of Patent: February 23, 2021
    Assignee: IP Reservoir, LLC
    Inventors: Roger D. Chamberlain, Mark Allen Franklin, Ronald S. Indeck, Ron K. Cytron, Sharath R. Cholleti
  • Patent number: 10908899
    Abstract: A code conversion apparatus includes a memory and a processor coupled to the memory. The memory is configured to store therein a first code including a first data definition of a plurality of arrays, a first operation for the plurality of arrays, and a second data definition of an array indicating a result of the first operation. The processor is configured to convert the first data definition and the second data definition included in the first code into a data definition of an array of structures. The processor is configured to convert the first operation included in the first code into a second operation for the array of structures. The processor is configured to generate a second code including a predetermined instruction to perform the second operation on different pieces of data of the plurality of arrays in parallel with one another.
    Type: Grant
    Filed: April 5, 2019
    Date of Patent: February 2, 2021
    Assignee: FUJITSU LIMITED
    Inventor: Shigeru Kimura
  • Patent number: 10909048
    Abstract: The disclosed techniques enable a software program to communicate with a peripheral device (e.g., a sensor), via a low-level communication protocol such as the I2C protocol, even though the software program does not include lower-level code configured to implement a sequence of operations defined for the low-level communication protocol. The techniques determine that the software program includes a high-level operation that instructs for communications to be conducted with the peripheral device. The high-level operation is associated with a separately stored configuration file that includes the lower-level code configured to implement the sequence of operations enabling the communications to be conducted with the peripheral device via the low-level communication protocol.
    Type: Grant
    Filed: September 30, 2019
    Date of Patent: February 2, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Alessandro Domenico Scarpantoni, Mei Ling Wilson, Shyamal K. Varma, Ajay P. Barboza
  • Patent number: 10830743
    Abstract: A method for determining a source of pollutant emissions for a selected area includes: mapping a grid onto a representation of the selected area; collecting monitoring data for the selected data; from the monitoring data, assigning air pollution values and weather values to each cell in the grid using an interpolation method to estimate values for gap cells; re-sizing the grid to mitigate the influence of atmospheric turbulence on the air pollution values; and using weather factor separation to minimize the influence of weather from the air pollution values, resulting in air pollution values that reflect the net pollutant emissions for the selected area.
    Type: Grant
    Filed: May 4, 2017
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Long Sheng Bai, Liang Liu, Zhuo Liu, Jun Mei Qu, Wei Zhuang
  • Patent number: 10782974
    Abstract: A VLIW (Very Long Instruction Word) interface device includes a memory configured to store instructions and data, and a processor configured to process the instructions and the data, wherein the processor includes an instruction fetcher configured to output an instruction fetch request to load the instruction from the memory, a decoder configured to decode the instruction loaded on the instruction fetcher, an arithmetic logic unit (ALU) configured to perform an operation function if the decoded instruction is an operation instruction, a memory interface scheduler configured to schedule the instruction fetch request or a data fetch request that is input from the arithmetic logic unit, and a memory operator configured to perform a memory access operation in accordance with the scheduled instruction fetch request or data fetch request.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: September 22, 2020
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Young-chul Cho, Suk-jin Kim, Chul-soo Park, Dong-kwan Suh
  • Patent number: 10691542
    Abstract: According to an embodiment, a storage device includes a plurality of memory nodes and a control unit. Each of the memory nodes includes a storage unit including a plurality of storage areas having a predetermined size. The memory nodes are connected to each other in two or more different directions. The memory nodes constitute two or more groups each including two or more memory nodes. The control unit is configured to sequentially allocate data writing destinations in the storage units to the storage areas respectively included in the different groups.
    Type: Grant
    Filed: September 11, 2013
    Date of Patent: June 23, 2020
    Assignee: Toshiba Memory Corporation
    Inventors: Yuki Sasaki, Takahiro Kurita, Atsuhiro Kinoshita
  • Patent number: 10685699
    Abstract: Examples of the present disclosure provide apparatuses and methods related to performing a sort operation in a memory. An example apparatus might include a a first group of memory cells coupled to a first sense line, a second group of memory cells coupled to a second sense line, and a controller configured to control sensing circuitry to sort a first element stored in the first group of memory cells and a second element stored in the second group of memory cells by performing an operation without transferring data via an input/output (I/O) line.
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: June 16, 2020
    Assignee: Micron Technology, Inc.
    Inventor: Kyle B. Wheeler
  • Patent number: 10679319
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Grant
    Filed: April 17, 2019
    Date of Patent: June 9, 2020
    Assignee: Imagination Technologies Limited
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Patent number: 10656912
    Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).
    Type: Grant
    Filed: November 6, 2019
    Date of Patent: May 19, 2020
    Assignee: Singular Computing LLC
    Inventor: Joseph Bates
  • Patent number: 10650303
    Abstract: Methods, systems, and computer storage media for implementing neural networks in fixed point arithmetic computing systems. In one aspect, a method includes the actions of receiving a request to process a neural network using a processing system that performs neural network computations using fixed point arithmetic; for each node of each layer of the neural network, determining a respective scaling value for the node from the respective set of floating point weight values for the node; and converting each floating point weight value of the node into a corresponding fixed point weight value using the respective scaling value for the node to generate a set of fixed point weight values for the node; and providing the sets of fixed point floating point weight values for the nodes to the processing system for use in processing inputs using the neural network.
    Type: Grant
    Filed: February 14, 2017
    Date of Patent: May 12, 2020
    Assignee: Google LLC
    Inventor: William John Gulland
  • Patent number: 10609188
    Abstract: An information processing apparatus includes a receiver to receive data-packets, the data-packets generated by dividing a message into division-data and storing, for each of the division-data, one of the plurality of division data into one of the plurality of data packets, wherein each of the data-packets also includes a data value indicating a quantity of the division-data and data indicating whether or not the data-packet includes final division data corresponding to an end of the message, a memory, and a processor to store the division-data that is contained in a packet of the data-packets that are received, in the memory, and suppress the final division-data from being stored in the memory until the quantity of the data-packets received by the receiver equates to the data value indicating the quantity of the division-data, in a case where the final division-data is received earlier than any one of the other division-data.
    Type: Grant
    Filed: April 3, 2018
    Date of Patent: March 31, 2020
    Assignee: FUJITSU LIMITED
    Inventors: Shinya Hiramoto, Yuji Kondo, Yuichiro Ajima
  • Patent number: 10558575
    Abstract: Systems, methods, and apparatuses relating to a configurable spatial accelerator are described. In one embodiment, a processor includes a core with a decoder to decode an instruction into a decoded instruction and an execution unit to execute the decoded instruction to perform a first operation; a plurality of processing elements; and an interconnect network between the plurality of processing elements to receive an input of a dataflow graph comprising a plurality of nodes, wherein the dataflow graph is to be overlaid into the interconnect network and the plurality of processing elements with each node represented as a dataflow operator in the plurality of processing elements, and the plurality of processing elements is to perform a second operation when an incoming operand set arrives at the plurality of processing elements.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: February 11, 2020
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Jr., Kent D. Glossop, Simon C. Steely, Jr., Jinjie Tang, Alan G. Gara
  • Patent number: 10534652
    Abstract: Given a current configuration of virtual node groups in a computing cluster and a new configuration indicating one or more changes to the virtual node groups, a cluster manager generates a reconfiguration plan to arrange virtual nodes into the desired virtual node groups of the new configuration while minimizing a number of virtual nodes to be moved between physical nodes in the computing cluster.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: January 14, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Rajib Dugar, Ashish Manral, Ganesh Narayanan
  • Patent number: 10481965
    Abstract: Counting status circuits are electrically coupled to corresponding status elements. The status elements selectably store a bit status of a bit line coupled to a memory array. The bit status can indicate one of at least pass and fail. The counting status circuits are electrically coupled to each other in a sequential order. Control logic causes processing of the counting status circuits in the sequential order to determine a total of the memory elements that store the bit status. The total number of memory elements that store the bit status indicate the number of error bits or non-error bits, which can help determine whether there are too many errors to be fixed by error correction codes.
    Type: Grant
    Filed: December 22, 2016
    Date of Patent: November 19, 2019
    Assignee: MACRONIX INTERNATIONAL CO., LTD.
    Inventors: Yih-Shan Yang, Shou-Nan Hung, Chun-Hsiung Hung, Yao-Jen Kuo, Meng-Fan Chang
  • Patent number: 10474625
    Abstract: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.
    Type: Grant
    Filed: January 17, 2012
    Date of Patent: November 12, 2019
    Assignee: International Business Machines Corporation
    Inventors: Michael E. Aho, John E. Attinella, Thomas M. Gooding, Michael B. Mundy
  • Patent number: 10474626
    Abstract: Configuring compute nodes in a parallel computer using remote direct memory access (‘RDMA’), the parallel computer comprising a plurality of compute nodes coupled for data communications via one or more data communications networks, including: initiating, by a source compute node of the parallel computer, an RDMA broadcast operation to broadcast binary configuration information to one or more target compute nodes in the parallel computer; preparing, by each target compute node, the target compute node for receipt of the binary configuration information from the source compute node; transmitting, by each target compute node, a ready message to the target compute node, the ready message indicating that the target compute node is ready to receive the binary configuration information from the source compute node; and performing, by the source compute node, an RDMA broadcast operation to write the binary configuration information into memory of each target compute node.
    Type: Grant
    Filed: December 10, 2012
    Date of Patent: November 12, 2019
    Assignee: International Business Machines Corporation
    Inventors: Michael E. Aho, John E. Attinella, Thomas M. Gooding, Michael B. Mundy
  • Patent number: 10445098
    Abstract: Methods and apparatuses relating to privileged configuration in spatial arrays are described. In one embodiment, a processor includes processing elements; an interconnect network between the processing elements; and a configuration controller coupled to a first subset and a second, different subset of the plurality of processing elements, the first subset having an output coupled to an input of the second, different subset, wherein the configuration controller is to configure the interconnect network between the first subset and the second, different subset of the plurality of processing elements to not allow communication on the interconnect network between the first subset and the second, different subset when a privilege bit is set to a first value and to allow communication on the interconnect network between the first subset and the second, different subset of the plurality of processing elements when the privilege bit is set to a second value.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Kermin E. Fleming, Simon C. Steely, Kent D. Glossop
  • Patent number: 10417159
    Abstract: An interconnection system, apparatus and method is described for arranging elements in a network, which may be a data memory system, computing system or communications system where the data paths are arranged and operated so as to control the power consumption and data skew properties of the system. A configurable switching element may be used to form the interconnections at nodes, where a control signal and other information is used to manage the power status of other aspects of the configurable switching element. Time delay skew of data being transmitted between nodes of the network may be altered by exchanging the logical and physical line assignments of the data at one or more nodes of the network. A method of laying out an interconnecting motherboard is disclosed which reduces the complexity of the trace routing.
    Type: Grant
    Filed: April 17, 2006
    Date of Patent: September 17, 2019
    Assignee: VIOLIN SYSTEMS LLC
    Inventor: Jon C. R. Bennett
  • Patent number: 10409765
    Abstract: An array of ALUs and a controlling and controlling unit providing the array sequentially ordered subapplications, wherein an ALU signals completion of execution of a subapplication to the controlling unit, which then provides a next sequential subapplication to the requesting ALU.
    Type: Grant
    Filed: June 21, 2017
    Date of Patent: September 10, 2019
    Assignee: PACT XPP SCHWEIZ AG
    Inventors: Martin Vorbach, Armin Nuckel
  • Patent number: 10341561
    Abstract: In a distributed video encoding system, a video is encoded by splitting into video segments and encoding the segments using multiple encoders. Prior to segmenting the video for distributed video encoding, image stabilization is performed on the video. For each frame in the video, a corresponding transform operation is first computed based on an estimated camera movement. Next, the video is segmented into multiple video segments and the corresponding per-frame transform information for the multiple video segments. The video segments are then distributed to multiple processing nodes that perform the image stabilization of the corresponding video segment by applying the corresponding transform. The results from all the stabilized video segments are then stitched back together for further video encoding operation.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: July 2, 2019
    Assignee: Facebook, Inc.
    Inventors: Amit Puntambekar, Michael Hamilton Coward
  • Patent number: 10282348
    Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, an arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs and an output. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective output buffer word. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and the accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output.
    Type: Grant
    Filed: April 5, 2016
    Date of Patent: May 7, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
  • Patent number: 10275393
    Abstract: A neural network unit configurable to first/second/third configurations has N narrow and N wide accumulators, multipliers and adders. Each multiplier performs a narrow/wide multiply on first and second narrow/wide inputs to generate a narrow/wide product. A first adder input receives a corresponding narrow/wide accumulator's output and third input receives a widened corresponding narrow multiplier's narrow product in the third configuration. In the first configuration, each narrow/wide adder performs a narrow/wide addition on the first and second inputs to generate a narrow/wide sum for storage into the corresponding narrow/wide accumulator. In the second configuration, each wide adder performs a wide addition on the first and a second input to generate a wide sum for storage into the corresponding wide accumulator. In the third configuration, each wide adder performs a wide addition on the first, second and third inputs to generate a wide sum for storage into the corresponding wide accumulator.
    Type: Grant
    Filed: April 5, 2016
    Date of Patent: April 30, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: G. Glenn Henry, Terry Parks
  • Patent number: 10268727
    Abstract: A technique of batching tuples can include determining a plurality of key-attributes for a plurality of tuples, creating a batch tuple, and calculating a hash value for the batch tuple.
    Type: Grant
    Filed: March 29, 2013
    Date of Patent: April 23, 2019
    Assignee: Hewlett Packard Enterprise Development LP
    Inventors: Matthias J. Sax, Maria Guadalupe Castellanos, Meichun Hsu
  • Patent number: 10229073
    Abstract: A system including at least one computation node including a memory, a processor reading/writing data in a work area of the memory and a DMA controller including a receiver receiving data from outside and writing it in a sharing area of the memory or a transmitter reading data in said sharing area and transmitting it outside. A write and read request mechanism is provided in order to cause, upon request of the processor, a data transfer, by the DMA controller, between the sharing area and the work area. The DMA controller includes an additional transmitting/receiving device designed for exchanging data between outside and the work area, without passing through the sharing area.
    Type: Grant
    Filed: March 2, 2017
    Date of Patent: March 12, 2019
    Assignee: Commissariat à l'énergie atomique et aux énergies alternatives
    Inventors: Thiago Raupp Da Rosa, Romain Lemaire, Fabien Clermidy
  • Patent number: 10216655
    Abstract: A memory interface apparatus is provided. The apparatus includes a central processing unit (CPU)-side protocol processor connected to a CPU through a parallel interface and a memory-side protocol processor connected to a memory through a parallel interface, and the CPU-side protocol processor and the memory-side protocol processor are connected through a serial link.
    Type: Grant
    Filed: June 27, 2016
    Date of Patent: February 26, 2019
    Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Yong Seok Choi, Hyuk Je Kwon