Patents by Inventor David Alexander Majnemer
David Alexander Majnemer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230418797Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing a kNN computation using a hardware accelerator. One of the methods includes obtaining a set of one or more query vectors; obtaining a set of database vectors; and performing, on a hardware accelerator and for each query vector in the set, a search for the k most similar database vectors to the query vector, comprising: computing, by circuitry of the hardware accelerator and for each query vector, a respective similarity value between the query vector and each database vector; and for each query vector, identifying, by the hardware accelerator and for each bin, (i) an index of the most similar database vector within the bin and (ii) the respective similarity value for the most similar database vector within the bin.Type: ApplicationFiled: June 26, 2023Publication date: December 28, 2023Inventors: Felix Ren-Chyan Chern, Blake Alan Hechtman, Andrew Thomas Davis, Ruiqi Guo, Sanjiv Kumar, David Alexander Majnemer
-
Patent number: 11763142Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: GrantFiled: September 2, 2022Date of Patent: September 19, 2023Assignee: Google LLCInventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Publication number: 20220414441Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: ApplicationFiled: September 2, 2022Publication date: December 29, 2022Inventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Patent number: 11500959Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors and similarly structured data that are generated in parallel, for example, on nodes organized in a mesh or torus topology defined by connections in at least two dimension between the nodes. The methods provide parallel computation and communication between nodes in the topology.Type: GrantFiled: August 16, 2019Date of Patent: November 15, 2022Assignee: Google LLCInventors: David Alexander Majnemer, Blake Alan Hechtman
-
Patent number: 11449739Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: GrantFiled: August 22, 2019Date of Patent: September 20, 2022Assignee: Google LLCInventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Publication number: 20220245453Abstract: Methods, systems, and apparatus, including an apparatus for redistributing tensor elements among computing units are described. In one aspect, a method includes distributing tensor elements of an N-dimensional tensor among multiple computing units of a computation system. Each computing unit redistributes the subset of tensor elements previously distributed to the computing unit to computing units. Each computing unit accesses redistribution partitioning data that specifies, for each computing unit, the tensor elements that are to be stored by the computing unit after redistributing the tensor elements. For each tensor element previously distributed to the particular computing unit, the computing unit determines a global linearized index value for the tensor element based on a multi-dimensional index for the tensor element.Type: ApplicationFiled: October 7, 2020Publication date: August 4, 2022Inventors: David Alexander Majnemer, Ravi Narayanaswami, Dong Hyuk Woo, Carrell Daniel Killebrew
-
Patent number: 11221879Abstract: Methods, systems, and apparatus for scheduling first-in-first-out instructions are described. In one aspect, a method includes receiving data representing code of a program to be executed by a processing unit comprising hardware processors. For each of one or more of the hardware processors, an order of independent groups of first-in-first-out (FIFO) instructions for execution by the hardware processor is identified in the data representing the code of the program. For each independent group of FIFO instructions for execution by the hardware processor, a path length metric that represents how long it will take to reach an end of the program from the independent group of FIFO instructions is determined. A new order of the independent groups of FIFO instructions for execution by the hardware processor is generated based at least on the path length metric for each independent group of FIFO instructions for execution by the hardware processor.Type: GrantFiled: July 2, 2020Date of Patent: January 11, 2022Assignee: Google LLCInventors: Yuanzhong Xu, James M. Stichnoth, David Alexander Majnemer
-
Publication number: 20210056396Abstract: Methods and systems, including computer programs encoded on a computer storage medium. In one aspect, a method includes the actions of receiving a request to perform convolutional computations for a neural network on a hardware circuit having a matrix computation unit, the request specifying the convolutional computation to be performed on a feature tensor and a filter and padding applied to the feature tensor prior to performing the convolutional computation; and generating instructions that when executed by the hardware circuit cause the hardware circuit to perform operations comprising: transferring feature tensor data from a main memory of the hardware circuit to a scratchpad memory of the hardware circuit; and repeatedly performing the following operations: identifying a current subset of the feature tensor; and determining whether a memory view into the scratchpad memory for the current subset is consistent with a memory view of the current subset in the main memory.Type: ApplicationFiled: August 22, 2019Publication date: February 25, 2021Inventors: David Alexander Majnemer, Blake Alan Hechtman, Bjarke Hammersholt Roune
-
Publication number: 20210049231Abstract: Methods, systems, and apparatus, including instructions encoded on storage media, for performing reduction of gradient vectors and similarly structured data that are generated in parallel, for example, on nodes organized in a mesh or torus topology defined by connections in at least two dimension between the nodes. The methods provide parallel computation and communication between nodes in the topology.Type: ApplicationFiled: August 16, 2019Publication date: February 18, 2021Inventors: David Alexander Majnemer, Blake Alan Hechtman
-
Publication number: 20200341807Abstract: Methods, systems, and apparatus for scheduling first-in-first-out instructions are described. In one aspect, a method includes receiving data representing code of a program to be executed by a processing unit comprising hardware processors. For each of one or more of the hardware processors, an order of independent groups of first-in-first-out (FIFO) instructions for execution by the hardware processor is identified in the data representing the code of the program. For each independent group of FIFO instructions for execution by the hardware processor, a path length metric that represents how long it will take to reach an end of the program from the independent group of FIFO instructions is determined. A new order of the independent groups of FIFO instructions for execution by the hardware processor is generated based at least on the path length metric for each independent group of FIFO instructions for execution by the hardware processor.Type: ApplicationFiled: July 2, 2020Publication date: October 29, 2020Inventors: Yuanzhong Xu, James M. Stichnoth, David Alexander Majnemer
-
Patent number: 10733016Abstract: Methods, systems, and apparatus for scheduling first-in-first-out instructions are described. In one aspect, a method includes receiving data representing code of a program to be executed by a processing unit comprising hardware processors. For each of one or more of the hardware processors, an order of independent groups of first-in-first-out (FIFO) instructions for execution by the hardware processor is identified in the data representing the code of the program. For each independent group of FIFO instructions for execution by the hardware processor, a path length metric that represents how long it will take to reach an end of the program from the independent group of FIFO instructions is determined. A new order of the independent groups of FIFO instructions for execution by the hardware processor is generated based at least on the path length metric for each independent group of FIFO instructions for execution by the hardware processor.Type: GrantFiled: April 26, 2019Date of Patent: August 4, 2020Assignee: Google LLCInventors: Yuanzhong Xu, James M. Stichnoth, David Alexander Majnemer
-
Patent number: 9575976Abstract: Methods and apparatuses that maintain birth time for a file system to optimize file update operations are described. The file system can include a plurality of snapshots or clones of data stored in one or more extents of blocks allocated in a storage device. Each extent may be associated with a time stamp according to the birth time. A request may be received from an executable using the file system to update data in a particular extent associated with a particular time stamp. In response, the current birth time in the file system and the particular time stamp may be compared to determine if the particular extent is not shared by more than one of the snapshots. If the particular time stamp is equal to the current birth time, the particular extent may be updated directly without performing an expensive operation to check whether a reference count of the particular extent is equal to one.Type: GrantFiled: September 17, 2014Date of Patent: February 21, 2017Assignee: Apple Inc.Inventors: Wenguang Wang, Deric Horn, David Alexander Majnemer, Owen Strain
-
Patent number: 8972690Abstract: Methods and apparatuses that maintain an access history of a file allocated with allocation blocks in storage devices are described. In response to receiving a usage request to allocate additional space for the file, an allocation block size may be adjusted or adapted based on the access history. The storage devices may be allocated with one or more allocation blocks using the adapted allocation block size to provide requested space for the file.Type: GrantFiled: January 5, 2010Date of Patent: March 3, 2015Inventors: Deric Horn, Donald James Brady, David Alexander Majnemer, Eric Brandon Tamura
-
Publication number: 20150006495Abstract: Methods and apparatuses that maintain birth time for a file system to optimize file update operations are described. The file system can include a plurality of snapshots or clones of data stored in one or more extents of blocks allocated in a storage device. Each extent may be associated with a time stamp according to the birth time. A request may be received from an executable using the file system to update data in a particular extent associated with a particular time stamp. In response, the current birth time in the file system and the particular time stamp may be compared to determine if the particular extent is not shared by more than one of the snapshots. If the particular time stamp is equal to the current birth time, the particular extent may be updated directly without performing an expensive operation to check whether a reference count of the particular extent is equal to one.Type: ApplicationFiled: September 17, 2014Publication date: January 1, 2015Inventors: Wenguang WANG, Deric HORN, David Alexander MAJNEMER, Owen STRAIN
-
Patent number: 8849876Abstract: Methods and apparatuses that maintain birth time for a file system to optimize file update operations are described. The file system can include a plurality of snapshots or clones of data stored in one or more extents of blocks allocated in a storage device. Each extent may be associated with a time stamp according to the birth time. A request may be received from an executable using the file system to update data in a particular extent associated with a particular time stamp. In response, the current birth time in the file system and the particular time stamp may be compared to determine if the particular extent is not shared by more than one of the snapshots. If the particular time stamp is equal to the current birth time, the particular extent may be updated directly without performing an expensive operation to check whether a reference count of the particular extent is equal to one.Type: GrantFiled: December 28, 2009Date of Patent: September 30, 2014Inventors: Wenguang Wang, Deric Horn, David Alexander Majnemer, Owen Strain
-
Patent number: 8504792Abstract: Methods and apparatuses that search tree representations of a bitmap for available blocks to allocate in storage devices are described. An allocation request for a file may be received to initiate the search. In one embodiment, the bitmap may include an array of bits corresponding to blocks in the storage devices. Each bit may indicate whether one of the blocks is available. The tree representations may include at least one red-black tree having nodes corresponding to one or more consecutive bits in the bitmap indicating an extent of available blocks. One of the tree representations may be selected according to a file associated with an allocation request to identify an extent of available block matching the allocation request. The tree representations may be synchronized as the bitmap is updated with changes of block allocations in the storage devices.Type: GrantFiled: December 22, 2009Date of Patent: August 6, 2013Assignee: Apple Inc.Inventors: Eric Brandon Tamura, David Alexander Majnemer
-
Publication number: 20110167239Abstract: Methods and apparatuses that maintain an access history of a file allocated with allocation blocks in storage devices are described. In response to receiving a usage request to allocate additional space for the file, an allocation block size may be adjusted or adapted based on the access history. The storage devices may be allocated with one or more allocation blocks using the adapted allocation block size to provide requested space for the file.Type: ApplicationFiled: January 5, 2010Publication date: July 7, 2011Inventors: Deric Horn, Donald James Brady, David Alexander Majnemer, Eric Brandon Tamura
-
Publication number: 20110161381Abstract: Methods and apparatuses that maintain birth time for a file system to optimize file update operations are described. The file system can include a plurality of snapshots or clones of data stored in one or more extents of blocks allocated in a storage device. Each extent may be associated with a time stamp according to the birth time. A request may be received from an executable using the file system to update data in a particular extent associated with a particular time stamp. In response, the current birth time in the file system and the particular time stamp may be compared to determine if the particular extent is not shared by more than one of the snapshots. If the particular time stamp is equal to the current birth time, the particular extent may be updated directly without performing an expensive operation to check whether a reference count of the particular extent is equal to one.Type: ApplicationFiled: December 28, 2009Publication date: June 30, 2011Inventors: Wenguang Wang, Deric Horn, David Alexander Majnemer, Owen Strain
-
Publication number: 20110153976Abstract: Methods and apparatuses that search tree representations of a bitmap for available blocks to allocate in storage devices are described. An allocation request for a file may be received to initiate the search. In one embodiment, the bitmap may include an array of bits corresponding to blocks in the storage devices. Each bit may indicate whether one of the blocks is available. The tree representations may include at least one red-black tree having nodes corresponding to one or more consecutive bits in the bitmap indicating an extent of available blocks. One of the tree representations may be selected according to a file associated with an allocation request to identify an extent of available block matching the allocation request. The tree representations may be synchronized as the bitmap is updated with changes of block allocations in the storage devices.Type: ApplicationFiled: December 22, 2009Publication date: June 23, 2011Inventors: Eric Brandon Tamura, David Alexander Majnemer