Patents by Inventor Md Mostofa Ali Patwary

Md Mostofa Ali Patwary has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Predicting deep learning scaling

Patent number: 11593655

Abstract: As deep learning application domains grow, a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements is extremely beneficial. Presented herein are large-scale empirical study of error and model size growth as training sets grow. Embodiments of a methodology for this measurement are introduced herein as well as embodiments for predicting other metrics, such as compute-related metrics. It is shown herein that power-law may be used to represent deep model relationships, such as error and training data size. It is also shown that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Type: Grant

Filed: November 30, 2018

Date of Patent: February 28, 2023

Assignee: Baidu USA LLC

Inventors: Joel Hestness, Gregory Diamos, Hee Woo Jun, Sharan Narang, Newsha Ardalani, Md Mostofa Ali Patwary, Yanqi Zhou
PREDICTING DEEP LEARNING SCALING

Publication number: 20200175374

Abstract: As deep learning application domains grow, a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements is extremely beneficial. Presented herein are large-scale empirical study of error and model size growth as training sets grow. Embodiments of a methodology for this measurement are introduced herein as well as embodiments for predicting other metrics, such as compute-related metrics. It is shown herein that power-law may be used to represent deep model relationships, such as error and training data size. It is also shown that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Type: Application

Filed: November 30, 2018

Publication date: June 4, 2020

Applicant: Baidu USA LLC

Inventors: Joel HESTNESS, Gregory DIAMOS, Hee Woo JUN, Sharan NARANG, Newsha ARDALANI, Md Mostofa Ali PATWARY, Yanqi ZHOU
Hardware content-associative data structure for acceleration of set operations

Publication number: 20170185403

Abstract: A processor includes a front end to receive an instruction, a decoder to decode the instruction, a set operations logic unit (SOLU) to execute the instruction, and a retirement unit to retire the instruction. The SOLU includes logic to store a first set of key-value pairs in a content-associative data structure, to receive a second set of key-value pairs, and to identify key-value pairs in the two sets with matching keys. The SOLU includes logic to add the second set of key-value pairs to the first set to produce an output set, and to apply an operation to the values of key-value pairs with matching keys, generating a single value for the matching key. The SOLU includes logic to produce an output set that includes key-value pairs from the first set with matching keys, and to discard key-value pairs from the first set with unique keys.

Type: Application

Filed: December 23, 2015

Publication date: June 29, 2017

Inventors: Michael J. Anderson, Sheng R. Li, Jong Soo Park, Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Mikhail Smelyanskiy, Narayanan Sundaram

Predicting deep learning scaling

PREDICTING DEEP LEARNING SCALING

Hardware content-associative data structure for acceleration of set operations