Patents by Inventor Sean Lie

Sean Lie has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Task synchronization for accelerated deep learning

Patent number: 12314218

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element conditionally selects for task initiation a previously received wavelet specifying a particular one of the virtual channels. The conditional selecting excludes the previously received wavelet for selection until at least block/unblock state maintained for the particular virtual channel is in an unblock state. The compute element executes block/unblock instructions to modify the block/unblock state.

Type: Grant

Filed: July 6, 2021

Date of Patent: May 27, 2025

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Michael Edwin James, Gary R. Lauterbach
ADVANCED WAVELET FILTERING FOR ACCELERATED DEEP LEARNING

Publication number: 20250131237

Abstract: Techniques in wavelet filtering for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets in accordance with virtual channel specifiers. Each processing element is enabled to perform local filtering of wavelets received at the processing element, selectively, conditionally, and/or optionally discarding zero or more of the received wavelets, thereby preventing further processing of the discarded wavelets. The wavelet filtering is performed by one or more configurable wavelet filters operable in various modes, such as counter, sparse, and range modes.

Type: Application

Filed: December 24, 2024

Publication date: April 24, 2025

Inventors: Michael Morrison, Michael Edwin James, Sean Lie, Srikanth Arekapudi, Gary R. Lauterbach
PLACEMENT OF COMPUTE AND MEMORY FOR ACCELERATED DEEP LEARNING

Publication number: 20250110808

Abstract: Techniques in placement of compute and memory for accelerated deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets. The routing is in accordance with virtual channel specifiers of the wavelets and controlled by routing configuration information of the router. A software stack determines placement of compute resources and memory resources based on a description of a neural network. The determined placement is used to configure the routers including usage of the respective colors. The determined placement is used to configure the compute elements including the respective programmed instructions each is configured to execute.

Type: Application

Filed: December 12, 2024

Publication date: April 3, 2025

Inventors: Vladimir Kibardin, Michael Edwin James, Michael Morrison, Sean Lie, Gary R. Lauterbach, Stanislav Funiak
DYNAMIC ROUTING FOR ACCELERATED DEEP LEARNING

Publication number: 20250080477

Abstract: Techniques in dynamic routing for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element enabled to execute programmed instructions using the data and a router enabled to route the wavelets via static routing, dynamic routing, or both. The routing is in accordance with a respective virtual channel specifier of each of the wavelets and controlled by routing configuration information of the router. The static techniques enable statically specifiable neuron connections. The dynamic techniques enable information from the wavelets to alter the routing configuration information during neural network processing.

Type: Application

Filed: November 12, 2024

Publication date: March 6, 2025

Inventors: Michael Morrison, Michael Edwin James, Sean Lie, Srikanth Arekapudi, Gary R. Lauterbach, Vijay Anand Reddy KORTHIKANTI
Advanced wavelet filtering for accelerated deep learning

Patent number: 12217147

Abstract: Techniques in wavelet filtering for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets in accordance with virtual channel specifiers. Each processing element is enabled to perform local filtering of wavelets received at the processing element, selectively, conditionally, and/or optionally discarding zero or more of the received wavelets, thereby preventing further processing of the discarded wavelets. The wavelet filtering is performed by one or more configurable wavelet filters operable in various modes, such as counter, sparse, and range modes.

Type: Grant

Filed: October 15, 2020

Date of Patent: February 4, 2025

Assignee: Cerebras Systems Inc.

Inventors: Michael Morrison, Michael Edwin James, Sean Lie, Srikanth Arekapudi, Gary R. Lauterbach
Placement of compute and memory for accelerated deep learning

Patent number: 12204954

Abstract: Techniques in placement of compute and memory for accelerated deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets. The routing is in accordance with virtual channel specifiers of the wavelets and controlled by routing configuration information of the router. A software stack determines placement of compute resources and memory resources based on a description of a neural network. The determined placement is used to configure the routers including usage of the respective colors. The determined placement is used to configure the compute elements including the respective programmed instructions each is configured to execute.

Type: Grant

Filed: October 29, 2020

Date of Patent: January 21, 2025

Assignee: Cerebras Systems Inc.

Inventors: Vladimir Kibardin, Michael Edwin James, Michael Morrison, Sean Lie, Gary R. Lauterbach, Stanislav Funiak
Dynamic routing for accelerated deep learning

Patent number: 12177133

Abstract: Techniques in dynamic routing for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element enabled to execute programmed instructions using the data and a router enabled to route the wavelets via static routing, dynamic routing, or both. The routing is in accordance with a respective virtual channel specifier of each of the wavelets and controlled by routing configuration information of the router. The static techniques enable statically specifiable neuron connections. The dynamic techniques enable information from the wavelets to alter the routing configuration information during neural network processing.

Type: Grant

Filed: October 14, 2020

Date of Patent: December 24, 2024

Assignee: Cerebras Systems Inc.

Inventors: Michael Morrison, Michael Edwin James, Sean Lie, Srikanth Arekapudi, Gary R. Lauterbach, Vijay Anand Reddy Korthikanti
Basic wavelet filtering for accelerated deep learning

Patent number: 12169771

Abstract: Techniques in wavelet filtering for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets in accordance with virtual channel specifiers. Each processing element is enabled to perform local filtering of wavelets received at the processing element, selectively, conditionally, and/or optionally discarding zero or more of the received wavelets, thereby preventing further processing of the discarded wavelets. The wavelet filtering is performed by one or more configurable wavelet filters operable in various modes, such as counter, sparse, and range modes.

Type: Grant

Filed: October 15, 2020

Date of Patent: December 17, 2024

Assignee: Cerebras Systems Inc.

Inventors: Michael Morrison, Michael Edwin James, Sean Lie, Srikanth Arekapudi, Gary R. Lauterbach
Accelerated deep learning

Patent number: 11934945

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.

Type: Grant

Filed: February 23, 2018

Date of Patent: March 19, 2024

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
Task activating for accelerated deep learning

Patent number: 11853867

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by virtual channel specifiers in each wavelet and routing configuration information in each router. Execution of an activate instruction or completion of a fabric vector operation activates one of the virtual channels. A virtual channel is selected from a pool comprising previously activated virtual channels and virtual channels associated with previously received wavelets. A task corresponding to the selected virtual channel is activated by executing instructions corresponding to the selected virtual channel.

Type: Grant

Filed: October 19, 2021

Date of Patent: December 26, 2023

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Michael Edwin James, Gary R. Lauterbach
Control wavelet for accelerated deep learning

Patent number: 11727254

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.

Type: Grant

Filed: August 27, 2020

Date of Patent: August 15, 2023

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Gary R. Lauterbach, Michael Edwin James, Michael Morrison, Srikanth Arekapudi
Data structure descriptors for deep learning acceleration

Patent number: 11727257

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Type: Grant

Filed: January 24, 2022

Date of Patent: August 15, 2023

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
OPTIMIZED PLACEMENT FOR EFFICIENCY FOR ACCELERATED DEEP LEARNING

Publication number: 20230125522

Abstract: Techniques in optimized placement for efficiency for accelerated deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets. The routing is in accordance with virtual channel specifiers of the wavelets and controlled by routing configuration information of the router. A software stack determines optimized placement based on a description of a neural network. The determined placement is used to configure the routers including usage of the respective colors. The determined placement is used to configure the compute elements including the respective programmed instructions each is configured to execute.

Type: Application

Filed: October 30, 2020

Publication date: April 27, 2023

Inventors: Vladimir KIBARDIN, Michael Edwin JAMES, Michael MORRISON, Sean LIE, Gary R. LAUTERBACH, Stanislav FUNIAK
PLACEMENT OF COMPUTE AND MEMORY FOR ACCELERATED DEEP LEARNING

Publication number: 20230071424

Abstract: Techniques in placement of compute and memory for accelerated deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets. The routing is in accordance with virtual channel specifiers of the wavelets and controlled by routing configuration information of the router. A software stack determines placement of compute resources and memory resources based on a description of a neural network. The determined placement is used to configure the routers including usage of the respective colors. The determined placement is used to configure the compute elements including the respective programmed instructions each is configured to execute.

Type: Application

Filed: October 29, 2020

Publication date: March 9, 2023

Inventors: Vladimir KIBARDIN, Michael Edwin JAMES, Michael MORRISON, Sean LIE, Gary R. LAUTERBACH, Stanislav FUNIAK
DYNAMIC ROUTING FOR ACCELERATED DEEP LEARNING

Publication number: 20230069536

Abstract: Techniques in dynamic routing for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element enabled to execute programmed instructions using the data and a router enabled to route the wavelets via static routing, dynamic routing, or both. The routing is in accordance with a respective virtual channel specifier of each of the wavelets and controlled by routing configuration information of the router. The static techniques enable statically specifiable neuron connections. The dynamic techniques enable information from the wavelets to alter the routing configuration information during neural network processing.

Type: Application

Filed: October 14, 2020

Publication date: March 2, 2023

Inventors: Michael MORRISON, Michael Edwin JAMES, Sean LIE, Srikanth AREKAPUDI, Gary R. LAUTERBACH, Vijay Anand Reddy KORTHIKANTI
Accelerated deep learning

Patent number: 11580394

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.

Type: Grant

Filed: June 24, 2020

Date of Patent: February 14, 2023

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
DATA STRUCTURE DESCRIPTORS FOR DEEP LEARNING ACCELERATION

Publication number: 20220398443

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.

Type: Application

Filed: January 24, 2022

Publication date: December 15, 2022

Inventors: Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Gary R. LAUTERBACH, Michael Edwin JAMES
DISTRIBUTED PLACEMENT OF LINEAR OPERATORS FOR ACCELERATED DEEP LEARNING

Publication number: 20220374288

Abstract: Techniques in distributed placement of linear operators for accelerated deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets. The routing is in accordance with virtual channel specifiers of the wavelets and controlled by routing configuration information of the router. A software stack determines distributed placement of linear operators based on a description of a neural network. The determined placement is used to configure the routers including usage of the respective colors. The determined placement is used to configure the compute elements including the respective programmed instructions each is configured to execute.

Type: Application

Filed: October 30, 2020

Publication date: November 24, 2022

Inventors: Vladimir KIBARDIN, Michael Edwin JAMES, Michael MORRISON, Sean LIE, Gary R. LAUTERBACH, Stanislav FUNIAK
Neuron smearing for accelerated deep learning

Patent number: 11488004

Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has memory. At least a first single neuron is implemented using resources of a plurality of the array of processing elements. At least a portion of a second neuron is implemented using resources of one or more of the plurality of processing elements. In some usage scenarios, the foregoing neuron implementation enables greater performance by enabling a single neuron to use the computational resources of multiple processing elements and/or computational load balancing across the processing elements while maintaining locality of incoming activations for the processing elements.

Type: Grant

Filed: April 15, 2018

Date of Patent: November 1, 2022

Assignee: Cerebras Systems Inc.

Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Michael Edwin James, Gary R. Lauterbach
ADVANCED WAVELET FILTERING FOR ACCELERATED DEEP LEARNING

Publication number: 20220343136

Abstract: Techniques in wavelet filtering for advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements comprising a portion of a neural network accelerator performs flow-based computations on wavelets of data. Each processing element comprises a compute element to execute programmed instructions using the data and a router to route the wavelets in accordance with virtual channel specifiers. Each processing element is enabled to perform local filtering of wavelets received at the processing element, selectively, conditionally, and/or optionally discarding zero or more of the received wavelets, thereby preventing further processing of the discarded wavelets. The wavelet filtering is performed by one or more configurable wavelet filters operable in various modes, such as counter, sparse, and range modes.

Type: Application

Filed: October 15, 2020

Publication date: October 27, 2022

Inventors: Michael MORRISON, Michael Edwin JAMES, Sean LIE, Srikanth AREKAPUDI, Gary R. LAUTERBACH

1 2 3 4 5 next