Patents by Inventor Huycu Ngo

Huycu Ngo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11823063
    Abstract: Individual distributed processing nodes packetize distributed data for each weight of a neural network of a learning object in an order of a number of the weight, transmit the distributed data to an aggregation processing node, acquire aggregation data transmitted from the node in order, and update the weight of the neural network. The node acquires the transmitted distributed data, packetizes the aggregation data for which the distributed data of all the distributed processing nodes is aggregated for each weight, and transmits the aggregation data to the individual nodes. The individual nodes monitor an unreceived data amount which is a difference between data amounts of the transmitted distributed data and the acquired aggregation data, and when the unreceived data amount becomes equal to or larger than a threshold Ma, stops transmission of the distributed data until the unreceived data amount becomes equal to or smaller than a threshold Mb (Mb<Ma).
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: November 21, 2023
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Tsuyoshi Ito, Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Takeshi Sakamoto
  • Publication number: 20220398431
    Abstract: Provided is a distributed deep learning system including a plurality of distributed processing nodes, in which each of the plurality of distributed processing nodes includes a header reading unit configured to read pieces of layer information of headers of a first data frame that has arrived at an own node and a second data frame that has arrived next, and in which the pieces of layer information are compared with each other, calculation processing is executed for a data frame including data that belongs to a layer closer to an input layer, and calculation processing for a data frame including data that belongs to a layer closer to an output layer is skipped.
    Type: Application
    Filed: November 13, 2019
    Publication date: December 15, 2022
    Inventors: Kenji Tanaka, Yuki Arikawa, Kenji Kawai, Junichi Kato, Tsuyoshi Ito, Huycu Ngo, Takeshi Sakamoto
  • Publication number: 20220327405
    Abstract: The inference processing apparatus includes an inference calculator that performs calculation of a neural network based on input data xt of each consecutive time step and weights W of a trained neural network to infer features of the input data xt and also includes a memory that stores input data xt and weight W, a temporary memory that stores an output ht?1 of an inference result of an immediately previous time step, and a switching controller that controls switching between a first operation mode TM1 in which the inference calculator performs calculation of the neural network based on the input data xt, the weight W, and the output ht?1, at each time step and a second operation mode TM2 in which the inference calculator performs calculation of the neural network based on the input data xt and the weight W at each time step.
    Type: Application
    Filed: June 5, 2019
    Publication date: October 13, 2022
    Inventors: Huycu Ngo, Yuki Arikawa, Takeshi Sakamoto
  • Publication number: 20220318572
    Abstract: An inference processing apparatus infers a feature of input data X using a trained neural network and includes a storage unit that stores the input data X and a weight W of the trained neural network, a setting unit that sets a bit accuracy of inference calculation and a number of units of the trained neural network based on an input inference accuracy, and an inference calculation unit that performs an inference calculation of the trained neural network, taking the input data X and the weight W as inputs, based on the bit accuracy of the inference calculation and the number of units set by the setting unit to infer the feature of the input data X.
    Type: Application
    Filed: June 5, 2019
    Publication date: October 6, 2022
    Inventors: Huycu Ngo, Yuki Arikawa, Takeshi Sakamoto
  • Publication number: 20220321641
    Abstract: A distributed deep learning system according to an embodiment includes M distributed processing nodes that perform deep learning of a neural network distributed from each other, and N aggregation processing nodes that are connected to each of the M distributed processing nodes via a first communication line and a second communication line, and perform aggregation of distributed processing results obtained at the M distributed processing nodes via the first communication line. Accordingly, even in a case of a plurality of users sharing the distributed deep learning system at the same time, efficient and stable distributed deep learning processing can be realized.
    Type: Application
    Filed: July 16, 2019
    Publication date: October 6, 2022
    Inventors: Tsuyoshi Ito, Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Takeshi Sakamoto, Kenji Tanaka
  • Publication number: 20220261620
    Abstract: A distributed processing node transmits distributed data for M groups as intermediate consolidated data from M communication units to a distributed processing node. A distributed processing node generates, for each group, updated intermediate consolidated data from the received intermediate consolidated data and distributed data, and transmits the updated intermediate consolidated data from the M communication units to a distributed processing node. The distributed processing node transmits the received intermediate consolidated data to a distributed processing node as consolidated data. The distributed processing node transmits the received consolidated data to a distributed processing node. Each of the distributed processing nodes updates weights of a neural network, based on the consolidated data.
    Type: Application
    Filed: June 3, 2019
    Publication date: August 18, 2022
    Inventors: Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Tsuyoshi lto, Takeshi Sakamoto
  • Publication number: 20220245452
    Abstract: A computing interconnect apparatus includes a reception unit configured to receive a packet transmitted from each of learning nodes and acquire a value of a gradient stored in the packet, an adder configured to calculate a sum of the gradient acquired by the reception unit in parallel separately for each of processing units in accordance with the number of the processing units to be carried out being determined by bit precision of the gradient and a desired processing speed, and a transmission unit configured to write calculation results of the sum of the gradient separate for each of the processing units being obtained by the adder into packetization and transmit the calculation results to each of the learning nodes.
    Type: Application
    Filed: May 31, 2019
    Publication date: August 4, 2022
    Inventors: Yuki Arikawa, Kenji Kawai, Junichi Kato, Huycu Ngo, Tsuyoshi Ito, Kenji Tanaka, Takeshi Sakamoto
  • Patent number: 11240296
    Abstract: A first distributed processing node transmits distributed data to a second distributed processing node as intermediate consolidated data. A third distributed processing node generates intermediate consolidated data after update from received intermediate consolidated data and distributed data, and transmits the intermediate consolidated data to a fourth distributed processing node. The first distributed processing node transmits the received intermediate consolidated data to fifth distributed processing node as consolidated data. The third distributed processing node transmits the received consolidated data to a sixth distributed processing node.
    Type: Grant
    Filed: October 7, 2019
    Date of Patent: February 1, 2022
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto
  • Publication number: 20220004842
    Abstract: A first distributed processing node transmits distributed data as intermediate consolidated data from a first communication port to a second distributed processing node. A third distributed processing node generates updated intermediate consolidated data from the received intermediate consolidated data and distributed data, and transmits the data from the first communication port to a fourth distributed processing node. The first distributed processing node transmits intermediate consolidated data received via a second communication port as consolidated data to a fifth distributed processing node from the second communication port. The third distributed processing node transmits consolidated data received via the first communication port to a sixth distributed processing node from the second communication port. Each of the distributed processing nodes updates a weight of a neural network based on the consolidated data.
    Type: Application
    Filed: October 7, 2019
    Publication date: January 6, 2022
    Inventors: Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto
  • Publication number: 20210406655
    Abstract: An inference processing apparatus includes an input data storage unit that stores pieces of input data, a learned storage unit that stores a piece of weight data of a neural network, a batch processing control unit that sets a batch size in accordance with information on the pieces of input data, a memory control unit that reads out, from the input data storage unit, the pieces of input data corresponding to the set batch size, and an inference operation unit that batch-processes operation in the neural network using, as input, the pieces of input data corresponding to the batch size and the piece of weight data and infers a feature of the pieces of input data.
    Type: Application
    Filed: December 25, 2019
    Publication date: December 30, 2021
    Applicant: Nippon Telegraph and Telephone Corporation
    Inventors: Huycu Ngo, Yuki Arikawa, Takeshi Sakamoto, Yasue Kishino
  • Publication number: 20210377339
    Abstract: A first distributed processing node transmits distributed data to a second distributed processing node as intermediate consolidated data. A third distributed processing node generates intermediate consolidated data after update from received intermediate consolidated data and distributed data, and transmits the intermediate consolidated data to a fourth distributed processing node. The first distributed processing node transmits the received intermediate consolidated data to fifth distributed processing node as consolidated data. The third distributed processing node transmits the received consolidated data to a sixth distributed processing node.
    Type: Application
    Filed: October 7, 2019
    Publication date: December 2, 2021
    Inventors: Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto
  • Publication number: 20210357723
    Abstract: A distributed processing system includes a plurality of lower-order aggregation networks and a higher-order aggregation network. The lower-order aggregation networks include a plurality of distributed processing nodes disposed in a ring form. The distributed processing nodes generate distributed data for each weight of a neural network of an own node. The lower-order aggregation networks aggregate, for each lower-order aggregation network, the distributed data generated by the distributed processing nodes. The higher-order aggregation network generates aggregated data where the aggregation results of the lower-order aggregation networks are further aggregated, and distributes to the lower-order aggregation networks. The lower-order aggregation networks distribute the aggregated data distributed thereto to the distributed processing nodes belonging to the same lower-order aggregation network. The distributed processing nodes update weights of the neural network based on the distributed aggregated data.
    Type: Application
    Filed: October 23, 2019
    Publication date: November 18, 2021
    Inventors: Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Kenji Tanaka, Takeshi Sakamoto, Tsuyoshi Ito
  • Publication number: 20210357760
    Abstract: A distributed deep learning system includes a plurality of computers connected to each other over a communication network, wherein each iteratively performs forward propagation calculation and backpropagation calculation based on learning data, and sends a calculation result of the backpropagation calculation to the communication network, and an Allreduce processing apparatus connected to the computers over the communication network, that processes the calculation results received from the plurality of computers, and returns the calculation results to transmission sources, wherein the computers each include a forward propagation calculator, a backpropagation calculator, a transfer processor that stores the calculation result of the backpropagation calculation in a transfer buffer each time the backpropagation calculator calculates the calculation result of the backpropagation calculation for each of layers, and a communicator that sequentially transmits the calculation results of the backpropagation calculati
    Type: Application
    Filed: October 25, 2019
    Publication date: November 18, 2021
    Inventors: Kenji Tanaka, Yuki Arikawa, Kenji Kawai, Junichi Kato, Tsuyoshi Ito, Huycu Ngo, Takeshi Sakamoto
  • Publication number: 20210216855
    Abstract: A distributed deep learning system that can achieve speeding-up by processing learning in parallel at a large number of learning nodes connected with a communication network and perform faster cooperative processing among the learning nodes connected through the communication network is provided. The distributed deep learning system includes: a plurality of computing interconnect devices 1 connected with each other through a ring communication network 3 through which communication is possible in one direction; and a plurality of learning nodes 2 connected with the respective computing interconnect devices 1 in a one-to-one relation, and each computing interconnect device 1 executes communication packet transmission-reception processing between the learning nodes 2 and All-reduce processing simultaneously in parallel.
    Type: Application
    Filed: May 27, 2019
    Publication date: July 15, 2021
    Inventors: Junichi Kato, Kenji Kawai, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto
  • Publication number: 20210216866
    Abstract: Individual distributed processing nodes packetize distributed data for each weight of a neural network of a learning object in an order of a number of the weight, transmit the distributed data to an aggregation processing node, acquire aggregation data transmitted from the node in order, and update the weight of the neural network. The node acquires the transmitted distributed data, packetizes the aggregation data for which the distributed data of all the distributed processing nodes is aggregated for each weight, and transmits the aggregation data to the individual nodes. The individual nodes monitor an unreceived data amount which is a difference between data amounts of the transmitted distributed data and the acquired aggregation data, and when the unreceived data amount becomes equal to or larger than a threshold Ma, stops transmission of the distributed data until the unreceived data amount becomes equal to or smaller than a threshold Mb (Mb<Ma).
    Type: Application
    Filed: May 21, 2019
    Publication date: July 15, 2021
    Inventors: Tsuyoshi Ito, Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Takeshi Sakamoto
  • Publication number: 20210209443
    Abstract: A first distributed processing node sets, as intermediate aggregated data, distributed data generated by the own node and transmits this data to the distributed processing node having the next number designated in advance. The intermediate distributed processing node excluding the first and last distributed processing nodes calculates, for each of weights corresponding thereto, a sum of the received intermediate aggregated data and distributed data generated by the own node, generates intermediate aggregated data after update, and transmits this data to the distributed processing node having the next number designated in advance. The last distributed processing node calculates, for each of the weights corresponding thereto, a sum of the received intermediate aggregated data and distributed data generated by the own node, generates aggregated data, and transmits this data to the first and intermediate distributed processing nodes.
    Type: Application
    Filed: May 5, 2019
    Publication date: July 8, 2021
    Inventors: Kenji Kawai, Junichi Kato, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto
  • Publication number: 20210117783
    Abstract: Each of distributed processing nodes [n] (n=1, . . . , and N) packetizes pieces of distributed data [m, n] as packets for every M weights w [m] ((m=1, . . . , and M) of a neural network to be learned in an order of numbers m, transmits the packets to a consolidation processing node, receives a packet transmitted from the consolidation processing node to acquire consolidated data R [m] in the order of numbers m and update the weights w [m] of the neural network on the basis of the consolidated data R [m].
    Type: Application
    Filed: February 6, 2019
    Publication date: April 22, 2021
    Applicant: Nippon Telegraph and Telephone Corporation
    Inventors: Kenji KAWAI, Junichi KATO, Huycu NGO, Yuki ARIKAWA, Tsuyoshi ITO, Takeshi SAKAMOTO
  • Publication number: 20210056416
    Abstract: Each of learning nodes calculates a gradient of a loss function from an output result obtained when learning data is input to a neural network to be learned, generates a packet for a plurality of gradient components, and transmits the packet to the computing interconnect device. The computing interconnect device acquires the values of a plurality of gradient components stored in the packet transmitted from each of the learning nodes, performs a calculation process in which configuration values of gradients with respect to the same configuration parameter of the neural network are input on each of a plurality of configuration values of each gradient in parallel, generates a packet for the calculation results, and transmits the packet to each of the learning nodes. Each of the learning nodes updates the configuration parameters of the neural network based on the value stored in the packet.
    Type: Application
    Filed: February 25, 2019
    Publication date: February 25, 2021
    Inventors: Junichi Kato, Kenji Kawai, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto
  • Publication number: 20210034978
    Abstract: Each of learning nodes calculates gradients of a loss function from an output result obtained by inputting learning data to a learning target neural network, converts a calculation result into a packet, and transmits the packet to a computing interconnect device. The computing interconnect device receives the packet transmitted from each of the learning nodes, acquires a value of the gradients stored in the packet, calculates a sum of the gradients, converts a calculation result into a packet, and transmits the packet to each of the learning nodes. Each of the learning nodes receives the packet transmitted from the computing interconnect device and updates a constituent parameter of a neural network based on a value stored in the packet.
    Type: Application
    Filed: February 6, 2019
    Publication date: February 4, 2021
    Inventors: Junichi Kato, Kenji Kawai, Huycu Ngo, Yuki Arikawa, Tsuyoshi Ito, Takeshi Sakamoto