Patents by Inventor Narayanan Sundaram

Narayanan Sundaram has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10198264
    Abstract: A processing device includes a sorting module, which adds to each of a plurality of elements a position value of a corresponding position in a register rest resulting in a plurality of transformed elements in corresponding positions. The plurality of elements include a plurality of bits. The sorting module compares each of the plurality of transformed elements to itself and to one another. The sorting module also assigns one of an enabled or disabled indicator to each of the plurality of the transformed elements based on the comparison. The sorting module further counts a number of the enabled indicators assigned to each of the plurality of the transformed elements to generate a sorted sequence of the plurality of elements.
    Type: Grant
    Filed: December 15, 2015
    Date of Patent: February 5, 2019
    Assignee: Intel Corporation
    Inventors: Asit K. Mishra, Deborah T. Marr, Jong Soo Park, Nadathur Rajagopalan Satish, Mikhail Smelyanskiy, Michael Anderson, Mostofa Ali Patwary, Narayanan Sundaram, Sheng Li
  • Publication number: 20170286122
    Abstract: A processor includes a front end including circuitry to receive and decode an instruction. The instruction is to perform a graph analytic function and pass the instruction to a graph accelerator. The graph accelerator including circuitry to process graph vertices and graph edges as datatypes, execute the instruction, and pass results of the instruction to a memory subsystem of the processor.
    Type: Application
    Filed: April 1, 2016
    Publication date: October 5, 2017
    Inventors: Lisa K. Wu, Tae Jun Ham, Nadathur Rajagopalan Satish, Narayanan Sundaram
  • Publication number: 20170185403
    Abstract: A processor includes a front end to receive an instruction, a decoder to decode the instruction, a set operations logic unit (SOLU) to execute the instruction, and a retirement unit to retire the instruction. The SOLU includes logic to store a first set of key-value pairs in a content-associative data structure, to receive a second set of key-value pairs, and to identify key-value pairs in the two sets with matching keys. The SOLU includes logic to add the second set of key-value pairs to the first set to produce an output set, and to apply an operation to the values of key-value pairs with matching keys, generating a single value for the matching key. The SOLU includes logic to produce an output set that includes key-value pairs from the first set with matching keys, and to discard key-value pairs from the first set with unique keys.
    Type: Application
    Filed: December 23, 2015
    Publication date: June 29, 2017
    Inventors: Michael J. Anderson, Sheng R. Li, Jong Soo Park, Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Mikhail Smelyanskiy, Narayanan Sundaram
  • Publication number: 20170177361
    Abstract: An apparatus and method are described for accelerating graph analytics. For example, one embodiment of a processor comprises: an instruction fetch unit to fetch program code including set intersection and set union operations; a graph accelerator unit (GAU) to execute at least a first portion of the program code related to the set intersection and set union operations and generate results; and an execution unit to execute at least a second portion of the program code using the results provided from the GAU.
    Type: Application
    Filed: December 22, 2015
    Publication date: June 22, 2017
    Inventors: Michael Anderson, Sheng Li, Jong Soo Park, MD Mostafa Ali Patwary, Nadathur Rajagopalan Satish, Mikhail Smelyanskiy, Narayanan Sundaram
  • Publication number: 20170168827
    Abstract: A processing device includes a sorting module, which adds to each of a plurality of elements a position value of a corresponding position in a register rest resulting in a plurality of transformed elements in corresponding positions. The plurality of elements include a plurality of bits. The sorting module compares each of the plurality of transformed elements to itself and to one another. The sorting module also assigns one of an enabled or disabled indicator to each of the plurality of the transformed elements based on the comparison. The sorting module further counts a number of the enabled indicators assigned to each of the plurality of the transformed elements to generate a sorted sequence of the plurality of elements.
    Type: Application
    Filed: December 15, 2015
    Publication date: June 15, 2017
    Inventors: Asit K. Mishra, Deborah T. Marr, Jong Soo Park, Nadathur Rajagopalan Satish, Mikhail Smelyanskiy, Michael Anderson, Mostofa Ali Patwary, Narayanan Sundaram, Sheng Li
  • Patent number: 8225074
    Abstract: In accordance with exemplary implementations, application computation operations and communications between operations on a host processing platform may be adapted to conform to the memory capacity of a parallel accelerator. Computation operations may be split and scheduled such that the computation operations fit within the memory capacity of the accelerator. Further, the operations may be automatically adapted without any modification to the code of an application. In addition, data transfers between a host processing platform and the parallel accelerator may be minimized in accordance with exemplary aspects of the present principles to improve processing performance.
    Type: Grant
    Filed: March 6, 2009
    Date of Patent: July 17, 2012
    Assignee: NEC Laboratories America, Inc.
    Inventors: Srimat T. Chakradhar, Anand Raghunathan, Narayanan Sundaram
  • Publication number: 20100088490
    Abstract: In accordance with exemplary implementations, application computation operations and communications between operations on a host processing platform may be adapted to conform to the memory capacity of a parallel accelerator. Computation operations may be split and scheduled such that the computation operations fit within the memory capacity of the accelerator. Further, the operations may be automatically adapted without any modification to the code of an application. In addition, data transfers between a host processing platform and the parallel accelerator may be minimized in accordance with exemplary aspects of the present principles to improve processing performance.
    Type: Application
    Filed: March 6, 2009
    Publication date: April 8, 2010
    Applicant: NEC Laboratories America, Inc.
    Inventors: Srimat T. Chakradhar, Anand Raghunathan, Narayanan Sundaram