Patents by Inventor Sean Lie

Sean Lie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Microthreading for accelerated deep learning

Patent number: 11475282

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.

Type: Grant

Filed: April 17, 2018

Date of Patent: October 18, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
Floating-point unit stochastic rounding for accelerated deep learning

Patent number: 11449574

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to perform stochastic rounding, thus in some circumstances enabling reducing systematic bias in long dependency chains of floating-point computations. The long dependency chains of floating-point computations are performed, e.g., to train a neural network or to perform inference with respect to a trained neural network.

Type: Grant

Filed: April 13, 2018

Date of Patent: September 20, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Edwin James, Michael Morrison, Gary R. Lauterbach, Srikanth Arekapudi
TASK ACTIVATING FOR ACCELERATED DEEP LEARNING

Publication number: 20220284275

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by virtual channel specifiers in each wavelet and routing configuration information in each router. Execution of an activate instruction or completion of a fabric vector operation activates one of the virtual channels. A virtual channel is selected from a pool comprising previously activated virtual channels and virtual channels associated with previously received wavelets. A task corresponding to the selected virtual channel is activated by executing instructions corresponding to the selected virtual channel.

Type: Application

Filed: October 19, 2021

Publication date: September 8, 2022

Inventors: Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Michael Edwin JAMES, Gary R. LAUTERBACH
NUMERICAL REPRESENTATION FOR NEURAL NETWORKS

Publication number: 20220172030

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to optionally and/or selectively perform floating-point operations in accordance with a programmable exponent bias and/or various floating-point computation variations. In some circumstances, the programmable exponent bias and/or the floating-point computation variations enable neural network processing with improved accuracy, decreased training time, decreased inference latency, and/or increased energy efficiency.

Type: Application

Filed: July 6, 2021

Publication date: June 2, 2022

Inventors: Michael Edwin JAMES, Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Gary R. LAUTERBACH
TASK SYNCHRONIZATION FOR ACCELERATED DEEP LEARNING

Publication number: 20220172031

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element conditionally selects for task initiation a previously received wavelet specifying a particular one of the virtual channels. The conditional selecting excludes the previously received wavelet for selection until at least block/unblock state maintained for the particular virtual channel is in an unblock state. The compute element executes block/unblock instructions to modify the block/unblock state.

Type: Application

Filed: July 6, 2021

Publication date: June 2, 2022

Inventors: Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Michael Edwin JAMES, Gary R. LAUTERBACH
Processor element redundancy for accelerated deep learning

Patent number: 11328208

Abstract: Techniques in advanced deep learning provide improvements in one or more of cost, accuracy, performance, and energy efficiency. The deep learning accelerator is implemented at least in part via wafer-scale integration. The wafer comprises a plurality of processor elements, each augmented with redundancy-enabling couplings. The redundancy-enabling couplings enable using redundant ones of the processor elements to replace defective ones of the processor elements. Defect information gathered at wafer test and/or in-situ, such as in a datacenter, is used to determine configuration information for the redundancy-enabling couplings.

Type: Grant

Filed: August 27, 2019

Date of Patent: May 10, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Edwin James, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach
Scaled compute fabric for accelerated deep learning

Patent number: 11328207

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, energy efficiency, and cost. In a first embodiment, a scaled array of processing elements is implementable with varying dimensions of the processing elements to enable varying price/performance systems. In a second embodiment, an array of clusters communicates via high-speed serial channels. The array and the channels are implemented on a Printed Circuit Board (PCB). Each cluster comprises respective processing and memory elements. Each cluster is implemented via a plurality of 3D-stacked dice, 2.5D-stacked dice, or both in a Ball Grid Array (BGA). A processing portion of the cluster is implemented via one or more Processing Element (PE) dice of the stacked dice. A memory portion of the cluster is implemented via one or more High Bandwidth Memory (HBM) dice of the stacked dice.

Type: Grant

Filed: August 11, 2019

Date of Patent: May 10, 2022

Assignee: Cerebras Systems Inc.

Inventors: Gary R. Lauterbach, Sean Lie, Michael Morrison, Michael Edwin James, Srikanth Arekapudi
ISA enhancements for accelerated deep learning

Patent number: 11321087

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element is enabled to execute instructions in accordance with an ISA. The ISA is enhanced in accordance with improvements with respect to deep learning acceleration.

Type: Grant

Filed: August 27, 2019

Date of Patent: May 3, 2022

Assignee: Cerebras Systems Inc.

Inventors: Michael Morrison, Michael Edwin James, Sean Lie, Srikanth Arekapudi, Gary R. Lauterbach
Fabric vectors for deep learning acceleration

Patent number: 11232347

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes various attributes of the fabric vector: length, microthreading eligibility, number of data elements to receive, transmit, and/or process in parallel, virtual channel and task identification information, whether to terminate upon receiving a control wavelet, and whether to mark an outgoing wavelet a control wavelet.

Type: Grant

Filed: April 17, 2018

Date of Patent: January 25, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Srikanth Arekapudi, Gary R. Lauterbach
Data structure descriptors for deep learning acceleration

Patent number: 11232348

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Type: Grant

Filed: July 15, 2020

Date of Patent: January 25, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
Task activating for accelerated deep learning

Patent number: 11157806

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by virtual channel specifiers in each wavelet and routing configuration information in each router. Execution of an activate instruction or completion of a fabric vector operation activates one of the virtual channels. A virtual channel is selected from a pool comprising previously activated virtual channels and virtual channels associated with previously received wavelets. A task corresponding to the selected virtual channel is activated by executing instructions corresponding to the selected virtual channel.

Type: Grant

Filed: April 17, 2018

Date of Patent: October 26, 2021

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Michael Edwin James, Gary R. Lauterbach
ISA ENHANCEMENTS FOR ACCELERATED DEEP LEARNING

Publication number: 20210255860

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element is enabled to execute instructions in accordance with an ISA. The ISA is enhanced in accordance with improvements with respect to deep learning acceleration.

Type: Application

Filed: August 27, 2019

Publication date: August 19, 2021

Inventors: Michael MORRISON, Michael Edwin JAMES, Sean LIE, Srikanth AREKAPUDI,, Gary R. LAUTERBACH
PROCESSOR ELEMENT REDUNDANCY FOR ACCELERATED DEEP LEARNING

Publication number: 20210256362

Abstract: Techniques in advanced deep learning provide improvements in one or more of cost, accuracy, performance, and energy efficiency. The deep learning accelerator is implemented at least in part via wafer-scale integration. The wafer comprises a plurality of processor elements, each augmented with redundancy-enabling couplings. The redundancy-enabling couplings enable using redundant ones of the processor elements to replace defective ones of the processor elements. Defect information gathered at wafer test and/or in-situ, such as in a datacenter, is used to determine configuration information for the redundancy-enabling couplings.

Type: Application

Filed: August 27, 2019

Publication date: August 19, 2021

Inventors: Sean LIE, Michael Edwin JAMES,, Michael MORRISON, Srikanth AREKAPUDI, Gary R. LAUTERBACH
SCALED COMPUTE FABRIC FOR ACCELERATED DEEP LEARNING

Publication number: 20210248453

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, energy efficiency, and cost. In a first embodiment, a scaled array of processing elements is implementable with varying dimensions of the processing elements to enable varying price/performance systems. In a second embodiment, an array of clusters communicates via high-speed serial channels. The array and the channels are implemented on a Printed Circuit Board (PCB). Each cluster comprises respective processing and memory elements. Each cluster is implemented via a plurality of 3D-stacked dice, 2.5D-stacked dice, or both in a Ball Grid Array (BGA). A processing portion of the cluster is implemented via one or more Processing Element (PE) dice of the stacked dice. A memory portion of the cluster is implemented via one or more High Bandwidth Memory (HBM) dice of the stacked dice.

Type: Application

Filed: August 11, 2019

Publication date: August 12, 2021

Inventors: Gary R. LAUTERBACH, Sean LIE, Michael MORRISON, Michael Edwin JAMES, Srikanth AREKAPUDI
CONTROL WAVELET FOR ACCELERATED DEEP LEARNING

Publication number: 20210224639

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.

Type: Application

Filed: August 27, 2020

Publication date: July 22, 2021

Inventors: Sean LIE, Gary R. LAUTERBACH, Michael Edwin JAMES, Michael MORRISON, Srikanth AREKAPUDI
Numerical representation for neural networks

Patent number: 11062202

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to optionally and/or selectively perform floating-point operations in accordance with a programmable exponent bias and/or various floating-point computation variations. In some circumstances, the programmable exponent bias and/or the floating-point computation variations enable neural network processing with improved accuracy, decreased training time, decreased inference latency, and/or increased energy efficiency.

Type: Grant

Filed: July 17, 2019

Date of Patent: July 13, 2021

Assignee: Cerebras Systems Inc.

Inventors: Michael Edwin James, Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach
Task synchronization for accelerated deep learning

Patent number: 11062200

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element conditionally selects for task initiation a previously received wavelet specifying a particular one of the virtual channels. The conditional selecting excludes the previously received wavelet for selection until at least block/unblock state maintained for the particular virtual channel is in an unblock state. The compute element executes block/unblock instructions to modify the block/unblock state.

Type: Grant

Filed: April 16, 2018

Date of Patent: July 13, 2021

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Michael Edwin James, Gary R. Lauterbach
DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION

Publication number: 20210166109

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Type: Application

Filed: July 15, 2020

Publication date: June 3, 2021

Inventors: Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Gary R. LAUTERBACH, Michael Edwin JAMES
NUMERICAL REPRESENTATION FOR NEURAL NETWORKS

Publication number: 20210142155

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has a respective floating-point unit enabled to optionally and/or selectively perform floating-point operations in accordance with a programmable exponent bias and/or various floating-point computation variations. In some circumstances, the programmable exponent bias and/or the floating-point computation variations enable neural network processing with improved accuracy, decreased training time, decreased inference latency, and/or increased energy efficiency.

Type: Application

Filed: July 17, 2019

Publication date: May 13, 2021

Inventors: Michael Edwin JAMES, Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Gary R. LAUTERBACH
ACCELERATED DEEP LEARNING

Publication number: 20210142167

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.

Type: Application

Filed: June 24, 2020

Publication date: May 13, 2021

Inventors: Sean LIE, Michael MORRISON, Michael Edwin JAMES, Gary R. LAUTERBACH, Srikanth AREKAPUDI

prev 1 2 3 4 5 next