Patents by Inventor Srinivas Sridharan
Srinivas Sridharan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210304025Abstract: A processor analyzes a machine learning workload. Corresponding priority levels are assigned to identified data requests in the machine learning workload based on an associated data dependency delay performance impact. The assigned corresponding priority levels are indicated when providing the data requests to a memory controller. The memory controller sorts the received data requests into a plurality of different priority queues based on the indicated corresponding priority levels. The memory controller initiates the data requests from the different priority queues to memory in an order based on different qualities of service of the different priority queues.Type: ApplicationFiled: March 24, 2020Publication date: September 30, 2021Inventor: Srinivas Sridharan
-
Patent number: 11094029Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: GrantFiled: April 10, 2017Date of Patent: August 17, 2021Assignee: INTEL CORPORATIONInventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
-
Patent number: 11068780Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.Type: GrantFiled: April 1, 2017Date of Patent: July 20, 2021Assignee: Intel CorporationInventors: Naveen K. Mellempudi, Srinivas Sridharan, Dheevatsa Mudigere, Dipankar Das
-
Patent number: 11023803Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.Type: GrantFiled: April 10, 2017Date of Patent: June 1, 2021Assignee: INTEL CORPORATIONInventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
-
Patent number: 11012511Abstract: A request for data from a distributed table is received at a network interface controller system. The request for data from the distributed table is identified as a request to be processed by the network interface controller system instead of a processor of a host computer system. The requested data is requested and received from a memory of the computing host computer system via a computer interface of the network interface controller system. The received requested data is caused to be cached in a cache of the network interface controller system.Type: GrantFiled: January 14, 2020Date of Patent: May 18, 2021Assignee: Facebook, Inc.Inventor: Srinivas Sridharan
-
Publication number: 20210110508Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.Type: ApplicationFiled: October 29, 2020Publication date: April 15, 2021Applicant: Intel CorporationInventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
-
Publication number: 20210109888Abstract: A technique includes performing a collective operation among multiple nodes of a parallel processing computer system using multiple parallel processing stages. The technique includes regulating an ordering of the parallel processing stages so that an initial stage of the plurality of parallel processing stages is associated with a higher node injection bandwidth than a subsequent stage of the plurality of parallel processing stages.Type: ApplicationFiled: September 30, 2017Publication date: April 15, 2021Inventors: Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
-
Patent number: 10825127Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.Type: GrantFiled: April 20, 2020Date of Patent: November 3, 2020Assignee: Intel CorporationInventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
-
Publication number: 20200265545Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.Type: ApplicationFiled: April 20, 2020Publication date: August 20, 2020Applicant: Intel CorporationInventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
-
Patent number: 10652353Abstract: Technologies for communication with direct data placement include a number of computing nodes in communication over a network. Each computing node includes a many-core processor having an integrated host fabric interface (HFI) that maintains an association table (AT). In response to receiving a message from a remote device, the HFI determines whether the AT includes an entry associating one or more parameters of the message to a destination processor core. If so, the HFI causes a data transfer agent (DTA) of the destination core to receive the message data. The DTA may place the message data in a private cache of the destination core. Message parameters may include a destination process identifier or other network address and a virtual memory address range. The HFI may automatically update the AT based on communication operations generated by software executed by the processor cores. Other embodiments are described and claimed.Type: GrantFiled: September 24, 2015Date of Patent: May 12, 2020Assignee: Intel CorporationInventors: James Dinan, Venkata Krishnan, Srinivas Sridharan, David A. Webb
-
Patent number: 10643297Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic; a decode unit to decode an instruction for execution by the compute unit, the instruction to cause the compute unit to perform a matrix arithmetic operation on a set of dynamic fixed-point tensors; and a dynamic precision manager to dynamically adjust the precision of a compute operation performed by the compute unit during the matrix arithmetic operation, the dynamic precision manager to adjust the precision of the compute operation to prevent an arithmetic overflow.Type: GrantFiled: January 29, 2018Date of Patent: May 5, 2020Assignee: Intel CorporationInventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
-
Publication number: 20200125499Abstract: Systems, apparatuses and methods may provide for technology that detects a runtime call to a communication library, wherein the runtime call identifies a memory buffer, determines that a class of service (CLOS) attribute is associated with the memory buffer, and issues a driver instruction to modify the CLOS attribute in response to the runtime call.Type: ApplicationFiled: December 17, 2019Publication date: April 23, 2020Inventors: Aravindh Anantaraman, Srinivas Sridharan, Ajaya Durg, Mohammad R. Haghighat, Mikhail E. Smorkalov, Sudarshan Srinivasan
-
Publication number: 20190205737Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.Type: ApplicationFiled: December 30, 2017Publication date: July 4, 2019Applicant: Intel CorporationInventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
-
Publication number: 20190205745Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library, the instructions to cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network.Type: ApplicationFiled: December 29, 2017Publication date: July 4, 2019Applicant: Intel CorporationInventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das, Chandrasekaran Sakthivel, Mikhail E. Smorkalov
-
Publication number: 20180322386Abstract: One embodiment provides for a system to configure distributed training of a neural network. The system includes memory to store a library to facilitate transmission of data during distributed training of the neural network; a network interface to transmit and receive gradient data associated with the trainable parameters; a general-purpose processor to execute instructions provided by the library, the instructions to cause the general-purpose processor to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework; and a graphics processor to perform compute operations associated with machine learning framework workflow to generate the gradient data associated with the trainable parameters, wherein, based on the machine learning framework workflow, the library is to interleave the compute operations on the graphics processor with transmission and receipt of gradient data via the network interface.Type: ApplicationFiled: January 12, 2018Publication date: November 8, 2018Applicant: Intel CorporationInventors: SRINIVAS SRIDHARAN, DHEEVATSA MUDIGERE
-
Publication number: 20180322606Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.Type: ApplicationFiled: January 12, 2018Publication date: November 8, 2018Applicant: Intel CorporationInventors: Dipankar Das, KARTHIKEYAN VAIDYANATHAN, Srinivas Sridharan
-
Publication number: 20180322387Abstract: One embodiment provides for a system to compute and distribute data for distributed training of a neural network, the system including first memory to store a first set of instructions including a machine learning framework; a fabric interface to enable transmission and receipt of data associated with the set of trainable machine learning parameters; a first set of general-purpose processor cores to execute the first set of instructions, the first set of instructions to provide a training workflow for computation of gradients for the trainable machine learning parameters and to communicate with a second set of instructions, the second set of instructions facilitate transmission and receipt of the gradients via the fabric interface; and a graphics processor to perform compute operations associated with the training workflow to generate the gradients for the trainable machine learning parameters.Type: ApplicationFiled: January 12, 2018Publication date: November 8, 2018Applicant: Intel CorporationInventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das
-
Publication number: 20180322607Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic; a decode unit to decode an instruction for execution by the compute unit, the instruction to cause the compute unit to perform a matrix arithmetic operation on a set of dynamic fixed-point tensors; and a dynamic precision manager to dynamically adjust the precision of a compute operation performed by the compute unit during the matrix arithmetic operation, the dynamic precision manager to adjust the precision of the compute operation to prevent an arithmetic overflow.Type: ApplicationFiled: January 29, 2018Publication date: November 8, 2018Applicant: Intel CorporationInventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
-
Publication number: 20180293492Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.Type: ApplicationFiled: April 10, 2017Publication date: October 11, 2018Applicant: Intel CorporationInventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
-
Publication number: 20180293493Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.Type: ApplicationFiled: April 10, 2017Publication date: October 11, 2018Applicant: Intel CorporationInventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS