Patents by Inventor Gary R. Lauterbach

Gary R. Lauterbach has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10726329
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: July 28, 2020
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Morrison, Srikanth Arekapudi, Gary R. Lauterbach, Michael Edwin James
  • Patent number: 10699189
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.
    Type: Grant
    Filed: February 23, 2018
    Date of Patent: June 30, 2020
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
  • Patent number: 10657438
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element comprises a respective compute element and a respective routing element. Each compute element comprises virtual input queues. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. Each router comprises data queues. The virtual input queues of the compute element and the data queues of the router are managed in accordance with the virtual channels. Backpressure information, per each of the virtual channels, is generated, communicated, and used to prevent overrun of the virtual input queues and the data queues.
    Type: Grant
    Filed: April 17, 2018
    Date of Patent: May 19, 2020
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Gary R. Lauterbach, Michael Edwin James, Michael Morrison, Srikanth Arekapudi
  • Publication number: 20200133741
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element conditionally selects for task initiation a previously received wavelet specifying a particular one of the virtual channels. The conditional selecting excludes the previously received wavelet for selection until at least block/unblock state maintained for the particular virtual channel is in an unblock state. The compute element executes block/unblock instructions to modify the block/unblock state.
    Type: Application
    Filed: April 16, 2018
    Publication date: April 30, 2020
    Inventors: Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Michael Edwin JAMES, Gary R. LAUTERBACH
  • Publication number: 20200125934
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.
    Type: Application
    Filed: April 17, 2018
    Publication date: April 23, 2020
    Inventors: Sean LIE, Michael MORRISON, Michael Edwin JAMES, Gary R. LAUTERBACH, Srikanth AREKAPUDI
  • Patent number: 10614357
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.
    Type: Grant
    Filed: April 15, 2018
    Date of Patent: April 7, 2020
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Gary R. Lauterbach, Michael Edwin James, Michael Morrison, Srikanth Arekapudi
  • Publication number: 20200005142
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.
    Type: Application
    Filed: February 23, 2018
    Publication date: January 2, 2020
    Inventors: Sean LIE, Michael MORRISON, Michael Edwin JAMES, Gary R. LAUTERBACH, Srikanth AREKAPUDI
  • Patent number: 10515303
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element with dedicated storage and a routing element. Each router enables communication with nearest neighbors in a 2D mesh. The communication is via wavelets in accordance with a representation comprising an index specifier, a virtual channel specifier, a task specifier, a data element specifier, and an optional control/data specifier. The virtual channel specifier and the task specifier are associated with one or more instructions. The index specifier and the data element are optionally associated with operands of the one or more instructions.
    Type: Grant
    Filed: April 15, 2018
    Date of Patent: December 24, 2019
    Assignee: Cerebras Systems Inc.
    Inventors: Sean Lie, Gary R. Lauterbach, Michael Edwin James, Michael Morrison, Srikanth Arekapudi
  • Publication number: 20190332926
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element comprises a respective compute element and a respective routing element. Each compute element comprises virtual input queues. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. Each router comprises data queues. The virtual input queues of the compute element and the data queues of the router are managed in accordance with the virtual channels. Backpressure information, per each of the virtual channels, is generated, communicated, and used to prevent overrun of the virtual input queues and the data queues.
    Type: Application
    Filed: April 17, 2018
    Publication date: October 31, 2019
    Inventors: Sean LIE, Gary R. LAUTERBACH, Michael Edwin JAMES, Michael MORRISON, Srikanth AREKAPUDI
  • Publication number: 20190286987
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.
    Type: Application
    Filed: April 15, 2018
    Publication date: September 19, 2019
    Inventors: Sean LIE, Gary R. LAUTERBACH, Michael Edwin JAMES, Michael MORRISON, Srikanth AREKAPUDI
  • Publication number: 20190258919
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a compute element with dedicated storage and a routing element. Each router enables communication with nearest neighbors in a 2D mesh. The communication is via wavelets in accordance with a representation comprising an index specifier, a virtual channel specifier, a task specifier, a data element specifier, and an optional control/data specifier. The virtual channel specifier and the task specifier are associated with one or more instructions. The index specifier and the data element are optionally associated with operands of the one or more instructions.
    Type: Application
    Filed: April 15, 2018
    Publication date: August 22, 2019
    Applicant: Cerebras Systems Inc.
    Inventors: Sean LIE, Gary R. LAUTERBACH, Michael Edwin JAMES, Michael MORRISON, Srikanth AREKAPUDI
  • Publication number: 20190258921
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. A compute element receives a wavelet. If a control specifier of the wavelet is a first value, then instructions are read from the memory of the compute element in accordance with an index specifier of the wavelet. If the control specifier is a second value, then instructions are read from the memory of the compute element in accordance with a virtual channel specifier of the wavelet. Then the compute element initiates execution of the instructions.
    Type: Application
    Filed: April 17, 2018
    Publication date: August 22, 2019
    Inventors: Sean LIE, Gary R. LAUTERBACH, Michael Edwin JAMES, Michael MORRISON, Srikanth AREKAPUDI
  • Publication number: 20190258920
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Instructions executed by the compute element include operand specifiers, some specifying a data structure register storing a data structure descriptor describing an operand as a fabric vector or a memory vector. The data structure descriptor further describes the memory vector as one of a one-dimensional vector, a four-dimensional vector, or a circular buffer vector. Optionally, the data structure descriptor specifies an extended data structure register storing an extended data structure descriptor. The extended data structure descriptor specifies parameters relating to a four-dimensional vector or a circular buffer vector.
    Type: Application
    Filed: April 17, 2018
    Publication date: August 22, 2019
    Applicant: Cerebras Systems Inc.
    Inventors: Sean LIE, Michael MORRISON, Srikanth AREKAPUDI, Gary R. LAUTERBACH, Michael Edwin JAMES
  • Publication number: 20180314941
    Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency, such as accuracy of learning, accuracy of prediction, speed of learning, performance of learning, and energy efficiency of learning. An array of processing elements performs flow-based computations on wavelets of data. Each processing element has a respective compute element and a respective routing element. Each compute element has processing resources and memory resources. Each router enables communication via wavelets with at least nearest neighbors in a 2D mesh. Stochastic gradient descent, mini-batch gradient descent, and continuous propagation gradient descent are techniques usable to train weights of a neural network modeled by the processing elements. Reverse checkpoint is usable to reduce memory usage during the training.
    Type: Application
    Filed: February 23, 2018
    Publication date: November 1, 2018
    Inventors: Sean LIE, Michael MORRISON, Michael Edwin JAMES, Gary R. LAUTERBACH, Srikanth AREKAPUDI
  • Patent number: 9331958
    Abstract: A cluster compute server includes nodes coupled in a network topology via a fabric that source routes packets based on location identifiers assigned to the nodes, the location identifiers representing the locations in the network topology. Host interfaces at the nodes may be associated with link layer addresses that do not reflect the location identifier associated with the nodes. The nodes therefore implement locally cached link layer address translations that map link layer addresses to corresponding location identifiers in the network topology. In response to originating a packet directed to one of these host interfaces, the node accesses the local translation cache to obtain a link layer address translation for a destination link layer address of the packet. When a node experiences a cache miss, the node queries a management node to obtain the specified link layer address translation from a master translation table maintained by the management node.
    Type: Grant
    Filed: December 31, 2012
    Date of Patent: May 3, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sean Lie, Vikrama Ditya, Gary R. Lauterbach
  • Publication number: 20140185611
    Abstract: A cluster compute server includes nodes coupled in a network topology via a fabric that source routes packets based on location identifiers assigned to the nodes, the location identifiers representing the locations in the network topology. Host interfaces at the nodes may be associated with link layer addresses that do not reflect the location identifier associated with the nodes. The nodes therefore implement locally cached link layer address translations that map link layer addresses to corresponding location identifiers in the network topology. In response to originating a packet directed to one of these host interfaces, the node accesses the local translation cache to obtain a link layer address translation for a destination link layer address of the packet. When a node experiences a cache miss, the node queries a management node to obtain the specified link layer address translation from a master translation table maintained by the management node.
    Type: Application
    Filed: December 31, 2012
    Publication date: July 3, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sean Lie, Vikrama Ditya, Gary R. Lauterbach
  • Patent number: 8166644
    Abstract: One embodiment of the present invention provides a system that facilitates capacitive communication between integrated circuit chips. The system includes a substrate having an active face upon which active circuitry and signal pads reside, and a back face opposite the active face. The system additionally includes an integrated circuit chip having an active face upon which active circuitry and signal pads reside, and a back face opposite the active face. Additionally, the integrated circuit chip is pressed against the substrate such that the active face of the integrated circuit chip is parallel to and adjacent to the active face of the substrate, and capacitive signal pads on the active face of the integrated circuit chip overlap signal pads on the active face of the substrate. The arrangement of the substrate and integrated circuit chip facilitates communication between the integrated circuit chip and the substrate through capacitive coupling via the overlapping signal pads.
    Type: Grant
    Filed: July 6, 2009
    Date of Patent: May 1, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert J. Drost, Gary R. Lauterbach, Danny Cohen
  • Patent number: 7783786
    Abstract: A system comprising a first node and a second node located in a single multiprocessor system, the first node including a first router and a first replicated service executing on a first operating system, the second node including a second router and a second replicated service executing on a second operating system, and a mesh interconnect connecting the first node to the second node using the first router and the second router.
    Type: Grant
    Filed: March 16, 2004
    Date of Patent: August 24, 2010
    Assignee: Oracle America Inc.
    Inventor: Gary R. Lauterbach
  • Publication number: 20090269884
    Abstract: One embodiment of the present invention provides a system that facilitates capacitive communication between integrated circuit chips. The system includes a substrate having an active face upon which active circuitry and signal pads reside, and a back face opposite the active face. The system additionally includes an integrated circuit chip having an active face upon which active circuitry and signal pads reside, and a back face opposite the active face. Additionally, the integrated circuit chip is pressed against the substrate such that the active face of the integrated circuit chip is parallel to and adjacent to the active face of the substrate, and capacitive signal pads on the active face of the integrated circuit chip overlap signal pads on the active face of the substrate. The arrangement of the substrate and integrated circuit chip facilitates communication between the integrated circuit chip and the substrate through capacitive coupling via the overlapping signal pads.
    Type: Application
    Filed: July 6, 2009
    Publication date: October 29, 2009
    Applicant: SUN MICROSYSTEMS, INC.
    Inventors: Robert J. Drost, Gary R. Lauterbach, Danny Cohen
  • Patent number: 7573720
    Abstract: One embodiment of the present invention provides a system that facilitates capacitive communication between integrated circuit chips. The system includes a substrate having an active face upon which active circuitry and signal pads reside, and a back face opposite the active face. The system additionally includes an integrated circuit chip having an active face upon which active circuitry and signal pads reside, and a back face opposite the active face. Additionally, the integrated circuit chip is pressed against the substrate such that the active face of the integrated circuit chip is parallel to and adjacent to the active face of the substrate, and capacitive signal pads on the active face of the integrated circuit chip overlap signal pads on the active face of the substrate. The arrangement of the substrate and integrated circuit chip facilitates communication between the integrated circuit chip and the substrate through capacitive coupling via the overlapping signal pads.
    Type: Grant
    Filed: June 15, 2005
    Date of Patent: August 11, 2009
    Assignee: Sun Microsystems, Inc.
    Inventors: Robert J. Drost, Gary R. Lauterbach, Danny Cohen