Patents by Inventor Srimat Chakradhar

Srimat Chakradhar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160110409
    Abstract: A method in a graph storage and processing system is provided. The method includes storing, in a scalable, distributed, fault-tolerant, in-memory graph storage device, base graph data representative of graphs, and storing, in a real-time, in memory graph storage device, update graph data representative of graph updates for the graphs with respect to a time threshold. The method further includes sampling the base graph data to generate sampled portions of the graphs and storing the sampled portions, by an in-memory graph sampler. The method additionally includes providing, by a query manager, a query interface between applications and the system. The method also includes forming, by the query manager, graph data representative of a complete graph from at least the base graph data and the update graph data, if any. The method includes processing, by a graph computer, the sampled portions using batch-type computations to generate approximate results for graph-based queries.
    Type: Application
    Filed: August 20, 2015
    Publication date: April 21, 2016
    Inventors: Kunal Rao, Giuseppe Coviello, Srimat Chakradhar, Souvik Bhattacherjee, Srihari Cadambi
  • Publication number: 20150277877
    Abstract: Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
    Type: Application
    Filed: March 25, 2015
    Publication date: October 1, 2015
    Inventors: Min Feng, Srimat Chakradhar, Linhai Song
  • Publication number: 20150242323
    Abstract: Systems and methods for system for source-to-source transformation for optimizing stacks and/or queues in an application, including identifying usage of stacks and queues in the application and collecting the resource usage and thread block configurations for the application. If the usage of stacks is identified, optimized code is generated by determining appropriate storage, partitioning stacks based on determined storage, and caching tops of the stacks in a register. If the identifier identifies usage of queues, optimized code is generated by combining queue operations in all threads in a warp/thread block into one batch queue operation, converting control divergence of the application to data divergence to enable warp-level queue operations, determining whether at least one of the threads includes a queue operation, and combining queue operations into threads in a warp.
    Type: Application
    Filed: February 25, 2015
    Publication date: August 27, 2015
    Inventors: Yi Yang, Min Feng, Srimat Chakradhar
  • Publication number: 20150212823
    Abstract: Methods are provided. A method for swapping-out an offload process from a coprocessor includes issuing a snapify_pause request from a host processor to the coprocessor to initiate a pausing of the offload process executing by the coprocessor and another process executing by the host processor using a plurality of locks. The offload process is previously offloaded from the host processor to the coprocessor. The method further includes issuing a snapify_capture request from the host processor to the coprocessor to initiate a local snapshot capture and saving of the local snapshot capture by the coprocessor. The method also includes issuing a snapify_wait request from the host processor to the coprocessor to wait for the local snapshot capture and the saving of the local snapshot capture to complete by the coprocessor.
    Type: Application
    Filed: December 16, 2014
    Publication date: July 30, 2015
    Inventors: Cheng-Hong Li, Giuseppe Coviello, Srimat Chakradhar, Arash Rezaei
  • Publication number: 20150212733
    Abstract: Systems and methods for swapping out and in pinned memory regions between main memory and a separate storage location in a system, including establishing an offload buffer in an interposing library; swapping out pinned memory regions by transferring offload buffer data from a coprocessor memory to a host processor memory, unregistering and unmapping a memory region employed by the offload buffer from the interposing library, wherein the interposing library is pre-loaded on the coprocessor, and collects and stores information employed during the swapping out. The pinned memory regions are swapped in by mapping and re-registering the files to the memory region employed by the offload buffer, and transferring data of the offload buffer data from the host memory back to the re-registered memory region.
    Type: Application
    Filed: January 23, 2015
    Publication date: July 30, 2015
    Inventors: Cheng-Hong LI, Giuseppe Coviello, Kunal Rao, Murugan Sankaradas, Srihari Cadambi, Srimat Chakradhar, Rajat Phull
  • Publication number: 20150212892
    Abstract: Methods are provided. A method includes capturing a snapshot of an offload process being executed by one or more many-core processors. The offload process is in signal communication with a host process being executed by a host processor. At least the offload is in signal communication with a monitoring process. The method further includes terminating the offload process on the one or more many-core processors, by the monitor process responsive to a communication between the monitor process and the offload processing being disrupted. The snapshot includes a respective predetermined minimum set of information required to restore a same state of the process as when the snapshot was taken.
    Type: Application
    Filed: December 16, 2014
    Publication date: July 30, 2015
    Inventors: Cheng-Hong Li, Giuseppe Coviello, Srimat Chakradhar, Arash Rezaei
  • Patent number: 9086925
    Abstract: A runtime method is disclosed that dynamically sets up core containers and thread-to-core affinity for processes running on manycore coprocessors. The method is completely transparent to user applications and incurs low runtime overhead. The method is implemented within a user-space middleware that also performs scheduling and resource management for both offload and native applications using the manycore coprocessors.
    Type: Grant
    Filed: April 6, 2013
    Date of Patent: July 21, 2015
    Assignee: NEC Laboratories America, Inc.
    Inventors: Cheng-Hong Li, Kunal Rao, Srihari Cadambi, Rajat Phull, Giuseppe Coviello, Murugan Sankaradas, Srimat Chakradhar
  • Patent number: 9038088
    Abstract: Methods and systems for managing data loads on a cluster of processors that implement an iterative procedure through parallel processing of data for the procedure are disclosed. One method includes monitoring, for at least one iteration of the procedure, completion times of a plurality of different processing phases that are undergone by each of the processors in a given iteration. The method further includes determining whether a load imbalance factor threshold is exceeded in the given iteration based on the completion times for the given iteration. In addition, the data is repartitioned by reassigning the data to the processors based on predicted dependencies between assigned data units of the data and completion times of a plurality of the processers for at least two of the phases. Further, the parallel processing is implemented on the cluster of processors in accordance with the reassignment.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: May 19, 2015
    Assignee: NEC Laboratories America, Inc.
    Inventors: Rajat Phull, Srihari Cadambi, Nishkam Ravi, Srimat Chakradhar
  • Publication number: 20150113514
    Abstract: Methods are provided for source-to-source transformations for graph processing on many-core platforms. A method includes receiving a graph application including one graph, expressed by a graph application programming interface configured for defining and manipulating graphs. The method further includes transforming, by a source-to-source compiler, the graph application into a plurality of parallel code variants. Each of the plurality of parallel code variants is specifically configured for parallel execution by a target one of a plurality of different many-core processors. The method also includes selecting and tuning, by a runtime component, a particular one of the parallel code variants for the parallel execution responsive to graph application characteristics, graph data, and an underlying code execution platform of the plurality of different many-core processors.
    Type: Application
    Filed: October 9, 2014
    Publication date: April 23, 2015
    Inventors: Srimat Chakradhar, Michela Becchi, Da Li
  • Publication number: 20150113542
    Abstract: A method is provided for controlling a compute cluster having a plurality of nodes. Each of the plurality of nodes has a respective computing device with a main server and one or more coprocessor-based hardware accelerators. The method includes receiving a plurality of jobs for scheduling. The method further includes scheduling the plurality of jobs across the plurality of nodes responsive to a knapsack-based sharing-aware schedule generated by a knapsack-based sharing-aware scheduler. The knapsack-based sharing-aware schedule is generated to co-locate together on a same computing device certain ones of the plurality of jobs that are mutually compatible based on a set of requirements whose fulfillment is determined using a knapsack-based sharing-aware technique that uses memory as a knapsack capacity and minimizes makespan while adhering to coprocessor memory and thread resource constraints.
    Type: Application
    Filed: October 3, 2014
    Publication date: April 23, 2015
    Inventors: Srihari Cadambi, Giuseppe Coviello, Srimat Chakradhar
  • Patent number: 8997073
    Abstract: A computer implemented method entails identifying code regions in an application from which offloadable tasks can be generated by a compiler for heterogenous computing system with processor and accelerator memory, including adding relaxed semantics to a directive based language in the heterogenous computing for allowing a suggesting rather than specifying a parallel code region as an offloadable candidate, and identifying one or more offloadable tasks in a neighborhood of code region marked by the directive.
    Type: Grant
    Filed: April 25, 2014
    Date of Patent: March 31, 2015
    Assignee: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
  • Patent number: 8984519
    Abstract: A system and method for scheduling client-server applications onto heterogeneous clusters includes storing at least one client request of at least one application in a pending request list on a computer readable storage medium. A priority metric is computed for each application, where the computed priority metric is applied to each client request belonging to that application. The priority metric is determined based on estimated performance of the client request and load on the pending request list. The at least one client request of the at least one application is scheduled based on the priority metric onto one or more heterogeneous resources.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: March 17, 2015
    Assignee: NEC Laboratories America, Inc.
    Inventors: Srihari Cadambi, Srimat Chakradhar, M. Mustafa Rafique
  • Publication number: 20150066988
    Abstract: Systems and methods for sorting data, including chunking unsorted data such that each chunk is of a size that fits within a last level cache of the system. One or more threads are instantiated in each physical core of the system, chunks assigned physical cores are distributed evenly across the threads on the physical cores. Subchunks in the physical cores are sorted using vector intrinsics, the subchunks being data assigned to the threads in the physical cores, and the subchunks are merged to generate sorted large chunks. A binary tree, which includes leaf nodes that correspond to the sorted large chunks, is built, leaf nodes are assigned to threads, and tree nodes are assigned to a circular buffer, wherein the circular buffer is lock and synchronization free. The large chunks are sorted to generate sorted data as output.
    Type: Application
    Filed: August 29, 2014
    Publication date: March 5, 2015
    Inventors: Srihari Cadambi, Srimat Chakradhar, Yuan Yuan
  • Publication number: 20150067225
    Abstract: There are provided source-to-source transformation methods for a multi-dimensional array and/or a multi-level pointer for a computer program. A method includes minimizing a number of holes for variable length elements for a given dimension of the array and/or pointer using at least two stride values included in stride buckets. The minimizing step includes modifying memory allocation sites, for the array and/or pointer, to allocate memory based on the stride values. The minimizing step further includes modifying a multi-dimensional memory access, for accessing the array and/or pointer, into a single dimensional memory access using the stride values. The minimizing step also includes inserting offload pragma for a data transfer of the array and/or pointer prior as at least one of a single-dimensional array and a single-level pointer. The data transfer is from a central processing unit to a coprocessor over peripheral component interconnect express.
    Type: Application
    Filed: June 2, 2014
    Publication date: March 5, 2015
    Applicant: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar, Bin Ren
  • Patent number: 8918770
    Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: December 23, 2014
    Assignee: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
  • Patent number: 8893103
    Abstract: Methods and systems for asynchronous offload to many-core coprocessors include splitting a loop in an input source code into a sampling sub-part, a many integrated core (MIC) sub-part, and a central processing unit (CPU) sub-part; executing the sampling sub-part with a processor to determine loop characteristics including memory- and processor-operations executed by the loop; identifying optimal split boundaries based on the loop characteristics such that the MIC sub-part will complete in a same amount of time when executed on a MIC processor as the CPU sub-part will take when executed on a CPU; and modifying the input source code to split the loop at the identified boundaries, such that the MIC sub-part is executed on a MIC processor and the CPU sub-part is concurrently executed on a CPU.
    Type: Grant
    Filed: July 12, 2013
    Date of Patent: November 18, 2014
    Assignee: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
  • Publication number: 20140325495
    Abstract: A computer implemented method entails identifying code regions in an application from which offloadable tasks can be generated by a compiler for heterogenous computing system with processor and accelerator memory, including adding relaxed semantics to a directive based language in the heterogenous computing for allowing a suggesting rather than specifying a parallel code region as an offloadable candidate, and identifying one or more offloadable tasks in a neighborhood of code region marked by the directive.
    Type: Application
    Filed: April 25, 2014
    Publication date: October 30, 2014
    Applicant: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
  • Publication number: 20140289637
    Abstract: A method for running application software for a mobile device by virtualizing a mobile device operating system (OS); running a virtual instance of the mobile device OS with the application software on a server on the cloud; and rendering on the server and sending a display image for the mobile device screen to be displayed on the mobile device.
    Type: Application
    Filed: January 8, 2014
    Publication date: September 25, 2014
    Applicant: NEC Laboratories America, Inc.
    Inventors: Giuseppe Coviello, Murugan Sankaradass, Srimat Chakradhar, Valentina Pelliccia
  • Publication number: 20140236913
    Abstract: Systems and methods for accelerating distributed transactions on key-value stores includes applying one or more policies of dynamic lock-localization, the policies including a lock migration stage that decreases nodes on which locks are present so that a transaction needs fewer number of network round trips to acquire locks, the policies including a lock ordering stage for pipelining during lock acquisition and wherein the order on locks to avoid deadlock is controlled by average contentions for the locks rather than static lexicographical ordering; and dynamically migrating and placing locks for distributed objects in distinct entity-groups in a datastore through the policies of dynamic lock-localization.
    Type: Application
    Filed: January 24, 2014
    Publication date: August 21, 2014
    Applicant: NEC Laboratories America, Inc.
    Inventors: Srimat Chakradhar, Naresh Rapolu
  • Publication number: 20140237477
    Abstract: Methods and systems for scheduling jobs to manycore nodes in a cluster include selecting a job to run according to the job's wait time and the job's expected execution time; sending job requirements to all nodes in a cluster, where each node includes a manycore processor; determining at each node whether said node has sufficient resources to ever satisfy the job requirements and, if no node has sufficient resources, deleting the job; creating a list of nodes that have sufficient free resources at a present time to satisfy the job requirements; and assigning the job to a node, based on a difference between an expected execution time and associated confidence value for each node and a hypothetical fastest execution time and associated hypothetical maximum confidence value.
    Type: Application
    Filed: April 24, 2014
    Publication date: August 21, 2014
    Applicant: NEC Laboratories America, Inc.
    Inventors: Srihari Cadambi, Kunal Rao, Srimat Chakradhar, Rajat Phull, Giuseppe Coviello, Murugan Sankaradass, Cheng-Hong Li