Patents by Inventor Hartmut Penner

Hartmut Penner has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Parallel computational architecture with reconfigurable core-level and vector-level parallelism

Patent number: 11847553

Abstract: Neural network processing hardware using parallel computational architectures with reconfigurable core-level and vector-level parallelism is provided. In various embodiments, a neural network model memory is adapted to store a neural network model comprising a plurality of layers. Each layer has at least one dimension and comprises a plurality of synaptic weights. A plurality of neural cores is provided. Each neural core includes a computation unit and an activation memory. The computation unit is adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations. The computation unit has a plurality of vector units. The activation memory is adapted to store the input activations and the output activations. The system is adapted to partition the plurality of cores into a plurality of partitions based on dimensions of the layer and the vector units.

Type: Grant

Filed: June 14, 2018

Date of Patent: December 19, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Andrew S. Cassidy, Myron D. Flickner, Pallab Datta, Hartmut Penner, Rathinakumar Appuswamy, Jun Sawada, John V. Arthur, Dharmendra S. Modha, Steven K. Esser, Brian Taba, Jennifer Klamo
Instruction distribution in an array of neural network cores

Patent number: 11663461

Abstract: Instruction distribution in an array of neural network cores is provided. In various embodiments, a neural inference chip is initialized with core microcode. The chip comprises a plurality of neural cores. The core microcode is executable by the neural cores to execute a tensor operation of a neural network. The core microcode is distributed to the plurality of neural cores via an on-chip network. The core microcode is executed synchronously by the plurality of neural cores to compute a neural network layer.

Type: Grant

Filed: July 5, 2018

Date of Patent: May 30, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Hartmut Penner, Dharmendra S. Modha, John V. Arthur, Andrew S. Cassidy, Rathinakumar Appuswamy, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Jun Sawada, Brian Taba
Distributed, event-based computation using neuromorphic cores

Patent number: 11645501

Abstract: Systems for distributed, event-based computation are provided. In various embodiments, the systems include a plurality of neurosynaptic processors and a network interconnecting the plurality of neurosynaptic processors. Each neurosynaptic processor includes a clock uncoupled from the clock of each other neurosynaptic processor. Each neurosynaptic processor is adapted to receive an input stream, the input stream comprising a plurality of inputs and a clock value associated with each of the plurality of inputs. Each neurosynaptic processor is adapted to compute, for each clock value, an output based on the inputs associated with that clock value. Each neurosynaptic processor is adapted to send to another of the plurality of neurosynaptic processors, via the network, the output and an associated clock value.

Type: Grant

Filed: February 28, 2018

Date of Patent: May 9, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Arnon Amir, David Berg, Pallab Datta, Jeffrey A. Kusnitz, Hartmut Penner
RUNTIME RECONFIGURABLE NEURAL NETWORK PROCESSOR CORE

Publication number: 20230062217

Abstract: Hardware neural network processors, are provided. A neural core includes a weight memory, an activation memory, a vector-matrix multiplier, and a vector processor. The vector-matrix multiplier is adapted to receive a weight matrix from the weight memory, receive an activation vector from the activation memory, and compute a vector-matrix multiplication of the weight matrix and the activation vector. The vector processor is adapted to receive one or more input vector from one or more vector source and perform one or more vector functions on the one or more input vector to yield an output vector. In some embodiments a programmable controller is adapted to configure and operate the neural core.

Type: Application

Filed: October 13, 2022

Publication date: March 2, 2023

Inventors: Andrew S. Cassidy, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
Runtime reconfigurable neural network processor core

Patent number: 11501140

Abstract: Hardware neural network processors, are provided. A neural core includes a weight memory, an activation memory, a vector-matrix multiplier, and a vector processor. The vector-matrix multiplier is adapted to receive a weight matrix from the weight memory, receive an activation vector from the activation memory, and compute a vector-matrix multiplication of the weight matrix and the activation vector. The vector processor is adapted to receive one or more input vector from one or more vector source and perform one or more vector functions on the one or more input vector to yield an output vector. In some embodiments a programmable controller is adapted to configure and operate the neural core.

Type: Grant

Filed: June 19, 2018

Date of Patent: November 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Andrew S. Cassidy, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
SYMBOLIC VALIDATION OF NEUROMORPHIC HARDWARE

Publication number: 20220129436

Abstract: Systems are provided that can produce symbolic and numeric representations of the neural network outputs, such that these outputs can be used to validate correctness of the implementation of the neural network. In various embodiments, a description of an artificial neural network containing no data-dependent branching is read. Based on the description of the artificial neural network, a symbolic representation is constructed of an output of the artificial neural network, the symbolic representation comprising at least one variable. The symbolic representation is compared to a ground truth symbolic representation, thereby validating the neural network system.

Type: Application

Filed: October 22, 2020

Publication date: April 28, 2022

Inventors: Alexander Andreopoulos, Dharmendra S. Modha, Andrew Stephen Cassidy, Brian Seisho Taba, Carmelo Di Nolfo, Hartmut Penner, John Vernon Arthur, Jun Sawada, Myron D. Flickner, Pallab Datta, Rathinakumar Appuswamy
Compound instruction set architecture for a neural inference chip

Patent number: 11263011

Abstract: A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.

Type: Grant

Filed: November 28, 2018

Date of Patent: March 1, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Andrew S. Cassidy, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Michael V. Debole, Steven K. Esser, Myron D. Flickner, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
Data distribution in an array of neural network cores

Patent number: 11238347

Abstract: Parallel processing among arrays of physical neural cores is provided. An array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer. A network is operatively connected to each of the neural cores. The output activation tensor is distributed across the neural cores. An input activation tensor is distributed across the neural cores. A weight tensor is distributed across the neural cores. Each neural core's computation comprises multiplying elements of a portion of the input activation tensor at that core with elements of a portion of the weight tensor at that core, and storing the summed products in a partial sum corresponding to an element of the output activation tensor. Each element of the output activation tensor is computed by accumulating all of the partial sums corresponding to that element via the network.

Type: Grant

Filed: September 28, 2018

Date of Patent: February 1, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian Taba, Andrew S. Cassidy, Myron D. Flickner, Pallab Datta, Hartmut Penner, Rathinakumar Appuswamy, Jun Sawada, John V. Arthur, Dharmendra S. Modha, Steven K. Esser, Jennifer Klamo
Massively parallel neural inference computing elements

Patent number: 11010662

Abstract: Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum. A plurality of function blocks is operatively coupled to one of the plurality of adders. Each of the plurality of function blocks is adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value.

Type: Grant

Filed: March 4, 2020

Date of Patent: May 18, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
Providing remote, reliant and high performance PCI express device in cloud computing environments

Patent number: 10936535

Abstract: A system architecture, a method, and a computer program product are disclosed for attaching remote physical devices. In one embodiment, the system architecture comprises a compute server and a device server. The compute server includes a system memory, and one or more remote device drivers; and the device server includes a system memory and one or more physical devices, and each of the physical devices includes an associated device memory. The compute server and the device server are connected through an existing network fabric that provides remote direct memory access (RDMA) services. A system mapping function logically connects one or more of the physical devices on the device server to the compute server, including mapping between the system memories and the device memories and keeping the system memories and the device memories in synchronization using the RDMA.

Type: Grant

Filed: March 20, 2019

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventors: Hubertus Franke, Davide Pasetto, Hartmut Penner
Selective multicast delivery on a bus-based interconnect

Patent number: 10834024

Abstract: According to one embodiment, a computer program product for performing selective multicast delivery includes a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and where the program instructions are executable by a selector of an intelligent processing unit (IPU) to cause the selector to perform a method comprising identifying, by the selector, an address header appended to an instance of data, comparing, by the selector, address data in the address header to identifier data stored at the selector, and conditionally delivering, by the selector, the instance of data, based on the comparing.

Type: Grant

Filed: September 28, 2018

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Simon J. Hollis, Hartmut Penner, Andrew S. Cassidy, Jun Sawada, Pallab Datta
Extending existing storage devices in virtualized environments

Patent number: 10768862

Abstract: A method, system and computer program product for providing a guest with access to a virtual storage on a physical storage using a peripheral component interface hub. In one embodiment, the method comprises the guest sending to the peripheral component interface hub a request to access the physical storage, the request including physical addresses of the physical storage, and the peripheral component interface hub sending specified information about the request to a hypervisor. This method further comprises the hypervisor determining whether to grant or to reject the request; and when the hypervisor grants the request, the hypervisor sending a configuration command to the peripheral component interface hub. This command includes a mapping of addresses from the physical storage to addresses from the virtual storage. In an embodiment, the peripheral component interface hub uses this mapping to replace the addresses in the request with translated virtual addresses.

Type: Grant

Filed: September 9, 2019

Date of Patent: September 8, 2020

Assignee: International Business Machines Corporation

Inventors: Hubertus Franke, Davide Pasetto, Hartmut Penner
MASSIVELY PARALLEL NEURAL INFERENCE COMPUTING ELEMENTS

Publication number: 20200202205

Abstract: Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum. A plurality of function blocks is operatively coupled to one of the plurality of adders. Each of the plurality of function blocks is adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value.

Type: Application

Filed: March 4, 2020

Publication date: June 25, 2020

Inventors: Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
COMPOUND INSTRUCTION SET ARCHITECTURE FOR A NEURAL INFERENCE CHIP

Publication number: 20200167158

Abstract: A device for controlling neural inference processor cores is provided, including a compound instruction set architecture. The device comprises an instruction memory, which comprises a plurality of instructions for controlling a neural inference processor core. Each of the plurality of instructions comprises a control operation. The device further comprises a program counter. The device further comprises at least one loop counter register. The device is adapted to execute the plurality of instructions. Executing the plurality of instructions comprises: reading an instruction from the instruction memory based on a value of the program counter; updating the at least one loop counter register according to the control operation of the instruction; and updating the program counter according to the control operation of the instruction and a value of the at least one loop counter register.

Type: Application

Filed: November 28, 2018

Publication date: May 28, 2020

Inventors: Andrew S. Cassidy, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Michael V. Debole, Steven K. Esser, Myron D. Flickner, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
Extending existing storage devices in virtualized environments

Patent number: 10628087

Abstract: A method, system and computer program product for providing a guest with access to a virtual storage on a physical storage using a peripheral component interface hub. In one embodiment, the method comprises the guest sending to the peripheral component interface hub a request to access the physical storage, the request including physical addresses of the physical storage, and the peripheral component interface hub sending specified information about the request to a hypervisor. This method further comprises the hypervisor determining whether to grant or to reject the request; and when the hypervisor grants the request, the hypervisor sending a configuration command to the peripheral component interface hub. This command includes a mapping of addresses from the physical storage to addresses from the virtual storage. In an embodiment, the peripheral component interface hub uses this mapping to replace the addresses in the request with translated virtual addresses.

Type: Grant

Filed: March 1, 2019

Date of Patent: April 21, 2020

Assignee: International Business Machines Corporation

Inventors: Hubertus Franke, Davide Pasetto, Hartmut Penner
MULTI-AGENT INSTRUCTION EXECUTION ENGINE FOR NEURAL INFERENCE PROCESSING

Publication number: 20200117465

Abstract: Multi-agent instruction execution engines for neural inference processing are provided. In various embodiments, a neural core is provided. The neural core includes an instruction memory. The instruction memory comprises a plurality of instruction streams, each instruction stream associated with one of a plurality of agents. The instruction memory further comprises a plurality of shared functional units. The neural core is adapted to concurrently execute the plurality of instruction streams on the plurality of associated agents. The execution includes maintaining a separate program counter for each of the plurality of agents, determining a plurality of operations from the instructions of each instruction stream, and directing the operations to the shared functional units. The instructions of each instruction stream are statically scheduled prior to runtime to ensure their execution is conflict free.

Type: Application

Filed: October 16, 2018

Publication date: April 16, 2020

Inventors: Andrew S. Cassidy, Simon J. Hollis, Hartmut Penner, Jun Sawada, Pallab Datta, John V. Arthur
NETWORKS FOR DISTRIBUTING PARAMETERS AND DATA TO NEURAL NETWORK COMPUTE CORES

Publication number: 20200117988

Abstract: Networks for distributing parameters and data to neural network compute cores. In various embodiments, a neural inference chip comprises a plurality of neural cores and at least one network interconnecting the plurality of neural cores. Each of the plurality of neural cores is adapted to apply a plurality of synaptic weights to a plurality of input activations to produce a plurality of output activations. The at least one network is adapted to simultaneously deliver synaptic weights and/or input activations to the plurality of neural cores.

Type: Application

Filed: October 11, 2018

Publication date: April 16, 2020

Inventors: John V. Arthur, Brian Taba, Rathinakumar Appuswamy, Andrew S. Cassidy, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada
DATA REPRESENTATION FOR DYNAMIC PRECISION IN NEURAL NETWORK CORES

Publication number: 20200117981

Abstract: Systems for neural network computation are provided. A neural network processor comprises a plurality of neural cores. The neural network processor has one or more processor precisions per activation. The processor is configured to accept data having a processor feature dimension. A transformation circuit is coupled to the neural network processor, and is adapted to: receive an input data tensor having an input precision per channel at one or more features; transform the input data tensor from the input precision to the processor precision; divide the input data into a plurality of blocks, each block conforming to one of the processor feature dimensions; provide each of the plurality of blocks to one of the plurality of neural cores. The neural network processor is adapted to compute, by the plurality of neural cores, output of one or more neural network layers.

Type: Application

Filed: October 11, 2018

Publication date: April 16, 2020

Inventors: John V. Arthur, Andrew S. Cassidy, Myron D. Flickner, Pallab Datta, Hartmut Penner, Rathinakumar Appuswamy, Jun Sawada, Dharmendra S. Modha, Steven K. Esser, Brian Taba, Jennifer Klamo
Massively parallel neural inference computing elements

Patent number: 10621489

Abstract: Massively parallel neural inference computing elements are provided. A plurality of multipliers is arranged in a plurality of equal-sized groups. Each of the plurality of multipliers is adapted to, in parallel, apply a weight to an input activation to generate an output. A plurality of adders is operatively coupled to one of the groups of multipliers. Each of the plurality of adders is adapted to, in parallel, add the outputs of the multipliers within its associated group to generate a partial sum. A plurality of function blocks is operatively coupled to one of the plurality of adders. Each of the plurality of function blocks is adapted to, in parallel, apply a function to the partial sum of its associated adder to generate an output value.

Type: Grant

Filed: March 30, 2018

Date of Patent: April 14, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Rathinakumar Appuswamy, John V. Arthur, Andrew S. Cassidy, Pallab Datta, Steven K. Esser, Myron D. Flickner, Jennifer Klamo, Dharmendra S. Modha, Hartmut Penner, Jun Sawada, Brian Taba
DATA DISTRIBUTION IN AN ARRAY OF NEURAL NETWORK CORES

Publication number: 20200104718

Abstract: Parallel processing among arrays of physical neural cores is provided. An array of neural cores is adapted to compute, in parallel, an output activation tensor of a neural network layer. A network is operatively connected to each of the neural cores. The output activation tensor is distributed across the neural cores. An input activation tensor is distributed across the neural cores. A weight tensor is distributed across the neural cores. Each neural core's computation comprises multiplying elements of a portion of the input activation tensor at that core with elements of a portion of the weight tensor at that core, and storing the summed products in a partial sum corresponding to an element of the output activation tensor. Each element of the output activation tensor is computed by accumulating all of the partial sums corresponding to that element via the network.

Type: Application

Filed: September 28, 2018

Publication date: April 2, 2020

Inventors: Brian Taba, Andrew S. Cassidy, Myron D. Flickner, Pallab Datta, Hartmut Penner, Rathinakumar Appuswamy, Jun Sawada, John V. Arthur, Dharmendra S. Modha, Steven K. Esser, Jennifer Klamo

1 2 3 next