Patents by Inventor Blake Alan Hechtman
Blake Alan Hechtman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250148357Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for compresses a machine learning model having a plurality of parameters. In one aspect, one of the methods includes obtaining trained values of a set of parameters for at least a portion of a machine learning model; identifying one or more dense ranges for the trained values; determining a least number of bits required to represent each trained value within the one or more dense ranges; identifying a second format having a range that is smaller than a range of the first format; and generating a compressed version of the at least a portion of the machine learning model.Type: ApplicationFiled: November 7, 2023Publication date: May 8, 2025Inventors: Aditya Binodkumar Agrawal, Blake Alan Hechtman, Matthew Leever Hedlund, David Alexander Majnemer, Marissa Karen Ikonomidis
-
Publication number: 20240378416Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.Type: ApplicationFiled: February 16, 2024Publication date: November 14, 2024Inventors: Blake Alan Hechtman, Sameer Kumar
-
Publication number: 20240232598Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: ApplicationFiled: September 18, 2023Publication date: July 11, 2024Inventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Patent number: 11907825Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.Type: GrantFiled: October 21, 2019Date of Patent: February 20, 2024Assignee: Google LLCInventors: Blake Alan Hechtman, Sameer Kumar
-
Publication number: 20230418797Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a kNN computation using a hardware accelerator. One of the methods includes obtaining a set of one or more query vectors; obtaining a set of database vectors; and performing, on a hardware accelerator and for each query vector in the set, a search for the k most similar database vectors to the query vector, comprising: computing, by circuitry of the hardware accelerator and for each query vector, a respective similarity value between the query vector and each database vector; and for each query vector, identifying, by the hardware accelerator and for each bin, (i) an index of the most similar database vector within the bin and (ii) the respective similarity value for the most similar database vector within the bin.Type: ApplicationFiled: June 26, 2023Publication date: December 28, 2023Inventors: Felix Ren-Chyan Chern, Blake Alan Hechtman, Andrew Thomas Davis, Ruiqi Guo, Sanjiv Kumar, David Alexander Majnemer
-
Patent number: 11763142Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: GrantFiled: September 2, 2022Date of Patent: September 19, 2023Assignee: Google LLCInventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Publication number: 20230206126Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for transforming patterns of operations on tensors in a computational graph to reduce the memory burden incurred when reshape operations are performed, in particular when deployed to hardware platforms that have vector instructions or vector memory requiring alignment of operands.Type: ApplicationFiled: December 23, 2022Publication date: June 29, 2023Inventor: Blake Alan Hechtman
-
Publication number: 20220414441Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: ApplicationFiled: September 2, 2022Publication date: December 29, 2022Inventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Patent number: 11537939Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for transforming patterns of operations on tensors in a computational graph to reduce the memory burden incurred when reshape operations are performed, in particular when deployed to hardware platforms that have vector instructions or vector memory requiring alignment of operands.Type: GrantFiled: May 3, 2019Date of Patent: December 27, 2022Assignee: Google LLCInventor: Blake Alan Hechtman
-
Patent number: 11500959Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors and similarly structured data that are generated in parallel, for example, on nodes organized in a mesh or torus topology defined by connections in at least two dimension between the nodes. The methods provide parallel computation and communication between nodes in the topology.Type: GrantFiled: August 16, 2019Date of Patent: November 15, 2022Assignee: Google LLCInventors: David Alexander Majnemer, Blake Alan Hechtman
-
Patent number: 11449739Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: GrantFiled: August 22, 2019Date of Patent: September 20, 2022Assignee: Google LLCInventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Publication number: 20210390410Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing images using a computer vision neural network that has one or more local self-attention layers. Each local self-attention layer is configured to apply or more local self-attention mechanisms to the layer input to the local self-attention layer.Type: ApplicationFiled: June 14, 2021Publication date: December 16, 2021Inventors: Ashish Teku Vaswani, Prajit Ramachandran, Aravind Srinivas Lakshminarayanan, Blake Alan Hechtman, Niki J. Parmar
-
Publication number: 20210056396Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: ApplicationFiled: August 22, 2019Publication date: February 25, 2021Inventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Publication number: 20210049231Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors and similarly structured data that are generated in parallel, for example, on nodes organized in a mesh or torus topology defined by connections in at least two dimension between the nodes. The methods provide parallel computation and communication between nodes in the topology.Type: ApplicationFiled: August 16, 2019Publication date: February 18, 2021Inventors: David Alexander Majnemer, Blake Alan Hechtman
-
Publication number: 20200349465Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for transforming patterns of operations on tensors in a computational graph to reduce the memory burden incurred when reshape operations are performed, in particular when deployed to hardware platforms that have vector instructions or vector memory requiring alignment of operands.Type: ApplicationFiled: May 3, 2019Publication date: November 5, 2020Inventor: Blake Alan Hechtman
-
Publication number: 20200125949Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors for distributed training of a neural network. One of the methods includes receiving, at each of the plurality of devices, a respective batch; performing, by each device, a forward pass comprising, for each batch normalization layer: generating, by each of the devices, a respective output of the corresponding other layer for each training example in the batch, determining, by each of the devices, a per-replica mean and a per-replica variance; determining, for each sub-group, a distributed mean and a distributed variance from the per-replica means and the per-replica variances for the devices in the sub-group; and applying, by each device, batch normalization to the respective outputs of the corresponding other layer generated by the device using the distributed mean and the distributed variance for the sub-group to which the device belongs.Type: ApplicationFiled: October 21, 2019Publication date: April 23, 2020Inventors: Blake Alan Hechtman, Sameer Kumar