Patents by Inventor Srinivas Sridharan

Srinivas Sridharan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210304025
    Abstract: A processor analyzes a machine learning workload. Corresponding priority levels are assigned to identified data requests in the machine learning workload based on an associated data dependency delay performance impact. The assigned corresponding priority levels are indicated when providing the data requests to a memory controller. The memory controller sorts the received data requests into a plurality of different priority queues based on the indicated corresponding priority levels. The memory controller initiates the data requests from the different priority queues to memory in an order based on different qualities of service of the different priority queues.
    Type: Application
    Filed: March 24, 2020
    Publication date: September 30, 2021
    Inventor: Srinivas Sridharan
  • Patent number: 11094029
    Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.
    Type: Grant
    Filed: April 10, 2017
    Date of Patent: August 17, 2021
    Assignee: INTEL CORPORATION
    Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
  • Patent number: 11068780
    Abstract: Technologies for artificial neural network training include a computing node with a host fabric interface that sends a message that includes one or more artificial neural network training algorithm values to another computing node in response to receipt of a request to send the message. Prior to sending the message, the host fabric interface may receive a request to quantize the message and quantize the message based on a quantization level included in the request to generate a quantized message. The quantization message includes one or more quantized values such that each quantized value has a lower precision than a corresponding artificial neural network training algorithm value. The host fabric interface then transmits the quantized message, which includes metadata indicative of the quantization level, to another computing node in response to quantization of the message for artificial neural network training. Other embodiments are described and claimed.
    Type: Grant
    Filed: April 1, 2017
    Date of Patent: July 20, 2021
    Assignee: Intel Corporation
    Inventors: Naveen K. Mellempudi, Srinivas Sridharan, Dheevatsa Mudigere, Dipankar Das
  • Patent number: 11023803
    Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.
    Type: Grant
    Filed: April 10, 2017
    Date of Patent: June 1, 2021
    Assignee: INTEL CORPORATION
    Inventors: Dhiraj D. Kalamkar, Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
  • Patent number: 11012511
    Abstract: A request for data from a distributed table is received at a network interface controller system. The request for data from the distributed table is identified as a request to be processed by the network interface controller system instead of a processor of a host computer system. The requested data is requested and received from a memory of the computing host computer system via a computer interface of the network interface controller system. The received requested data is caused to be cached in a cache of the network interface controller system.
    Type: Grant
    Filed: January 14, 2020
    Date of Patent: May 18, 2021
    Assignee: Facebook, Inc.
    Inventor: Srinivas Sridharan
  • Publication number: 20210110508
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
    Type: Application
    Filed: October 29, 2020
    Publication date: April 15, 2021
    Applicant: Intel Corporation
    Inventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
  • Publication number: 20210109888
    Abstract: A technique includes performing a collective operation among multiple nodes of a parallel processing computer system using multiple parallel processing stages. The technique includes regulating an ordering of the parallel processing stages so that an initial stage of the plurality of parallel processing stages is associated with a higher node injection bandwidth than a subsequent stage of the plurality of parallel processing stages.
    Type: Application
    Filed: September 30, 2017
    Publication date: April 15, 2021
    Inventors: Karthikeyan Vaidyanathan, Srinivas Sridharan, Dipankar Das
  • Patent number: 10825127
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
    Type: Grant
    Filed: April 20, 2020
    Date of Patent: November 3, 2020
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
  • Publication number: 20200265545
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic, the compute unit to receive a set of dynamic fixed-point tensors, compute, via the dynamic precision fixed-point logic, a right-shift value using an absolute maximum value within the set of dynamic fixed-point tensors and a dynamic range of the set of dynamic fixed-point tensors, right-shift data values within the set of dynamic fixed-point tensors based on the right-shift value, increment a shared exponent associated with the set of dynamic fixed-point tensors based on the right-shift value, perform a compute operation on the set of dynamic fixed-point tensors, and generate an output tensor via the compute operation on the set of dynamic fixed-point tensors.
    Type: Application
    Filed: April 20, 2020
    Publication date: August 20, 2020
    Applicant: Intel Corporation
    Inventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
  • Patent number: 10652353
    Abstract: Technologies for communication with direct data placement include a number of computing nodes in communication over a network. Each computing node includes a many-core processor having an integrated host fabric interface (HFI) that maintains an association table (AT). In response to receiving a message from a remote device, the HFI determines whether the AT includes an entry associating one or more parameters of the message to a destination processor core. If so, the HFI causes a data transfer agent (DTA) of the destination core to receive the message data. The DTA may place the message data in a private cache of the destination core. Message parameters may include a destination process identifier or other network address and a virtual memory address range. The HFI may automatically update the AT based on communication operations generated by software executed by the processor cores. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 24, 2015
    Date of Patent: May 12, 2020
    Assignee: Intel Corporation
    Inventors: James Dinan, Venkata Krishnan, Srinivas Sridharan, David A. Webb
  • Patent number: 10643297
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic; a decode unit to decode an instruction for execution by the compute unit, the instruction to cause the compute unit to perform a matrix arithmetic operation on a set of dynamic fixed-point tensors; and a dynamic precision manager to dynamically adjust the precision of a compute operation performed by the compute unit during the matrix arithmetic operation, the dynamic precision manager to adjust the precision of the compute operation to prevent an arithmetic overflow.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: May 5, 2020
    Assignee: Intel Corporation
    Inventors: Naveen Mellempudi, Dheevatsa Mudigere, Dipankar Das, Srinivas Sridharan
  • Publication number: 20200125499
    Abstract: Systems, apparatuses and methods may provide for technology that detects a runtime call to a communication library, wherein the runtime call identifies a memory buffer, determines that a class of service (CLOS) attribute is associated with the memory buffer, and issues a driver instruction to modify the CLOS attribute in response to the runtime call.
    Type: Application
    Filed: December 17, 2019
    Publication date: April 23, 2020
    Inventors: Aravindh Anantaraman, Srinivas Sridharan, Ajaya Durg, Mohammad R. Haghighat, Mikhail E. Smorkalov, Sudarshan Srinivasan
  • Publication number: 20190205737
    Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.
    Type: Application
    Filed: December 30, 2017
    Publication date: July 4, 2019
    Applicant: Intel Corporation
    Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
  • Publication number: 20190205745
    Abstract: Embodiments described herein provide a system to configure distributed training of a neural network, the system comprising memory to store a library to facilitate data transmission during distributed training of the neural network; a network interface to enable transmission and receipt of configuration data associated with a set of worker nodes, the worker nodes configured to perform distributed training of the neural network; and a processor to execute instructions provided by the library, the instructions to cause the processor to create one or more groups of the worker nodes, the one or more groups of worker nodes to be created based on a communication pattern for messages to be transmitted between the worker nodes during distributed training of the neural network.
    Type: Application
    Filed: December 29, 2017
    Publication date: July 4, 2019
    Applicant: Intel Corporation
    Inventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das, Chandrasekaran Sakthivel, Mikhail E. Smorkalov
  • Publication number: 20180322386
    Abstract: One embodiment provides for a system to configure distributed training of a neural network. The system includes memory to store a library to facilitate transmission of data during distributed training of the neural network; a network interface to transmit and receive gradient data associated with the trainable parameters; a general-purpose processor to execute instructions provided by the library, the instructions to cause the general-purpose processor to configure the network interface to transmit and receive the gradient data associated with the trainable parameters during a workflow of a machine learning framework; and a graphics processor to perform compute operations associated with machine learning framework workflow to generate the gradient data associated with the trainable parameters, wherein, based on the machine learning framework workflow, the library is to interleave the compute operations on the graphics processor with transmission and receipt of gradient data via the network interface.
    Type: Application
    Filed: January 12, 2018
    Publication date: November 8, 2018
    Applicant: Intel Corporation
    Inventors: SRINIVAS SRIDHARAN, DHEEVATSA MUDIGERE
  • Publication number: 20180322606
    Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising multi-dimensionally partitioning data of a feature map across multiple nodes for distributed training of a convolutional neural network; performing a parallel convolution operation on the multiple partitions to train weight data of the neural network; and exchanging data between nodes to enable computation of halo regions, the halo regions having dependencies on data processed by a different node.
    Type: Application
    Filed: January 12, 2018
    Publication date: November 8, 2018
    Applicant: Intel Corporation
    Inventors: Dipankar Das, KARTHIKEYAN VAIDYANATHAN, Srinivas Sridharan
  • Publication number: 20180322387
    Abstract: One embodiment provides for a system to compute and distribute data for distributed training of a neural network, the system including first memory to store a first set of instructions including a machine learning framework; a fabric interface to enable transmission and receipt of data associated with the set of trainable machine learning parameters; a first set of general-purpose processor cores to execute the first set of instructions, the first set of instructions to provide a training workflow for computation of gradients for the trainable machine learning parameters and to communicate with a second set of instructions, the second set of instructions facilitate transmission and receipt of the gradients via the fabric interface; and a graphics processor to perform compute operations associated with the training workflow to generate the gradients for the trainable machine learning parameters.
    Type: Application
    Filed: January 12, 2018
    Publication date: November 8, 2018
    Applicant: Intel Corporation
    Inventors: Srinivas Sridharan, Karthikeyan Vaidyanathan, Dipankar Das
  • Publication number: 20180322607
    Abstract: One embodiment provides for a graphics processing unit to perform computations associated with a neural network, the graphics processing unit comprising compute unit including a hardware logic unit having dynamic precision fixed-point logic; a decode unit to decode an instruction for execution by the compute unit, the instruction to cause the compute unit to perform a matrix arithmetic operation on a set of dynamic fixed-point tensors; and a dynamic precision manager to dynamically adjust the precision of a compute operation performed by the compute unit during the matrix arithmetic operation, the dynamic precision manager to adjust the precision of the compute operation to prevent an arithmetic overflow.
    Type: Application
    Filed: January 29, 2018
    Publication date: November 8, 2018
    Applicant: Intel Corporation
    Inventors: Naveen MELLEMPUDI, DHEEVATSA MUDIGERE, DIPANKAR DAS, SRINIVAS SRIDHARAN
  • Publication number: 20180293492
    Abstract: One embodiment provides for a non-transitory machine readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising providing an interface to define a neural network using machine-learning domain specific terminology, wherein the interface enables selection of a neural network topology and abstracts low-level communication details of distributed training of the neural network.
    Type: Application
    Filed: April 10, 2017
    Publication date: October 11, 2018
    Applicant: Intel Corporation
    Inventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS
  • Publication number: 20180293493
    Abstract: One embodiment provides for a method of transmitting data between multiple compute nodes of a distributed compute system, the method comprising creating a global view of communication operations to be performed between the multiple compute nodes of the distributed compute system, the global view created using information specific to a machine learning model associated with the distributed compute system; using the global view to determine a communication cost of the communication operations; and automatically determining a number of network endpoints for use in transmitting the data between the multiple compute nodes of the distributed compute system.
    Type: Application
    Filed: April 10, 2017
    Publication date: October 11, 2018
    Applicant: Intel Corporation
    Inventors: Dhiraj D. Kalamkar, KARTHIKEYAN VAIDYANATHAN, SRINIVAS SRIDHARAN, DIPANKAR DAS