Patents by Inventor Srimat Chakradhar

Srimat Chakradhar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20170228645
    Abstract: Aspects of the present disclosure describe techniques for training a convolutional neural network using an inconsistent stochastic gradient descent (ISGD) algorithm. Training effort for training batches used by the ISGD algorithm are dynamically adjusted according to a determined loss for a given training batch which are classified into two sub states—well-trained or under-trained. The ISGD algorithm provides more iterations for under-trained batches while reducing iterations for well-trained ones.
    Type: Application
    Filed: February 2, 2017
    Publication date: August 10, 2017
    Inventors: Linnan WANG, Yi YANG, Renqiang MIN, Srimat CHAKRADHAR
  • Patent number: 9720597
    Abstract: Systems and methods for swapping out and in pinned memory regions between main memory and a separate storage location in a system, including establishing an offload buffer in an interposing library; swapping out pinned memory regions by transferring offload buffer data from a coprocessor memory to a host processor memory, unregistering and unmapping a memory region employed by the offload buffer from the interposing library, wherein the interposing library is pre-loaded on the coprocessor, and collects and stores information employed during the swapping out. The pinned memory regions are swapped in by mapping and re-registering the files to the memory region employed by the offload buffer, and transferring data of the offload buffer data from the host memory back to the re-registered memory region.
    Type: Grant
    Filed: January 23, 2015
    Date of Patent: August 1, 2017
    Assignee: NEC Corporation
    Inventors: Cheng-Hong Li, Giuseppe Coviello, Kunal Rao, Murugan Sankaradas, Srihari Cadambi, Srimat Chakradhar, Rajat Phull
  • Patent number: 9658823
    Abstract: Systems and methods for system for source-to-source transformation for optimizing stacks and/or queues in an application, including identifying usage of stacks and queues in the application and collecting the resource usage and thread block configurations for the application. If the usage of stacks is identified, optimized code is generated by determining appropriate storage, partitioning stacks based on determined storage, and caching tops of the stacks in a register. If the identifier identifies usage of queues, optimized code is generated by combining queue operations in all threads in a warp/thread block into one batch queue operation, converting control divergence of the application to data divergence to enable warp-level queue operations, determining whether at least one of the threads includes a queue operation, and combining queue operations into threads in a warp.
    Type: Grant
    Filed: February 25, 2015
    Date of Patent: May 23, 2017
    Assignee: NEC Corporation
    Inventors: Yi Yang, Min Feng, Srimat Chakradhar
  • Patent number: 9652247
    Abstract: Methods are provided. A method for swapping-out an offload process from a coprocessor includes issuing a snapify_pause request from a host processor to the coprocessor to initiate a pausing of the offload process executing by the coprocessor and another process executing by the host processor using a plurality of locks. The offload process is previously offloaded from the host processor to the coprocessor. The method further includes issuing a snapify_capture request from the host processor to the coprocessor to initiate a local snapshot capture and saving of the local snapshot capture by the coprocessor. The method also includes issuing a snapify_wait request from the host processor to the coprocessor to wait for the local snapshot capture and the saving of the local snapshot capture to complete by the coprocessor.
    Type: Grant
    Filed: December 16, 2014
    Date of Patent: May 16, 2017
    Assignee: NEC Corporation
    Inventors: Cheng-Hong Li, Giuseppe Coviello, Srimat Chakradhar, Arash Rezaei
  • Patent number: 9569161
    Abstract: A method for running application software for a mobile device by virtualizing a mobile device operating system (OS); running a virtual instance of the mobile device OS with the application software on a server on the cloud; and rendering on the server and sending a display image for the mobile device screen to be displayed on the mobile device.
    Type: Grant
    Filed: January 8, 2014
    Date of Patent: February 14, 2017
    Assignee: NEC Corporation
    Inventors: Giuseppe Coviello, Murugan Sankaradass, Srimat Chakradhar, Valentina Pelliccia
  • Patent number: 9535826
    Abstract: There are provided source-to-source transformation methods for a multi-dimensional array and/or a multi-level pointer for a computer program. A method includes minimizing a number of holes for variable length elements for a given dimension of the array and/or pointer using at least two stride values included in stride buckets. The minimizing step includes modifying memory allocation sites, for the array and/or pointer, to allocate memory based on the stride values. The minimizing step further includes modifying a multi-dimensional memory access, for accessing the array and/or pointer, into a single dimensional memory access using the stride values. The minimizing step also includes inserting offload pragma for a data transfer of the array and/or pointer prior as at least one of a single-dimensional array and a single-level pointer. The data transfer is from a central processing unit to a coprocessor over peripheral component interconnect express.
    Type: Grant
    Filed: June 2, 2014
    Date of Patent: January 3, 2017
    Assignee: NEC Corporation
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar, Bin Ren
  • Publication number: 20160342888
    Abstract: Aspects of the present disclosure are directed to techniques that improve performance of CNN systems through the effect of improved memory efficiencies for CNNs operating on GPUs. Aspects of the disclosure demonstrate that off-chip memory in such CNN systems is underutilized due to at least three characteristics namely, data layout, data locality and inter-kernel redundancy. Aspects of the disclosure examine the performance impact of different data layouts and then describe a method to produce data layout selection for various layers of the CNN including a fast transformation implementation. Disclosed are improvements to data locality from working set expansion, elimination of inter-kernel redundancy and increase of TLP using kernel reconstruction techniques including kernel fusion and thread injection. Disclosed experimental results show that our optimizations are very effective to boost the performance of CNNs by amounts up to 9.76 times for a single kernel and 2.05 times for a network.
    Type: Application
    Filed: May 20, 2016
    Publication date: November 24, 2016
    Inventors: Yi YANG, Chao LI, Min FENG, Srimat CHAKRADHAR
  • Patent number: 9471289
    Abstract: Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
    Type: Grant
    Filed: March 25, 2015
    Date of Patent: October 18, 2016
    Assignee: NEC Corporation
    Inventors: Min Feng, Srimat Chakradhar, Linhai Song
  • Publication number: 20160299920
    Abstract: Systems and methods for recognizing a face are disclosed and includes receiving images of faces; generating feature vectors of the images; generating clusters of feature vectors each with a centroids or a cluster representative; for a query to search for a face, generating corresponding feature vectors for the face and comparing the feature vector with the centroids of all clusters; for clusters above a similarity threshold, comparing cluster members with the corresponding feature vector; and indicating as matching candidates for cluster members with similarity above a threshold.
    Type: Application
    Filed: April 1, 2016
    Publication date: October 13, 2016
    Inventors: Min Feng, Giuseppe Coviello, Srimat Chakradhar, Nitin Agrawal, Yi Yang
  • Publication number: 20160298978
    Abstract: A system for planning a trip includes heterogeneous data sources including map data, traffic information, vehicle trace data, weather reports, social media data, commuter feedback data, GIS data, travel time data; a stream analytics engine coupled to the heterogeneous data sources; a batch analytics engine coupled to the heterogeneous data sources; and a multi-modal journey planner coupled to the stream analytics engine and the batch analytics engine, the multi-modal journey planner processing indoor travel information and providing real-time updates while a journey is under progress, the multi-modal journey planner providing a journey time forecast as the journey time reflects indoor travel time.
    Type: Application
    Filed: April 1, 2016
    Publication date: October 13, 2016
    Inventors: Murugan Sankaradas, Kunal Rao, Srimat Chakradhar
  • Publication number: 20160300157
    Abstract: A big data processing system includes a memory management engine having stream buffers, realtime views and models, and batch views and models, the stream buffers coupleable to one or more stream processing frameworks to process stream data, the batch models coupleable to one or more batch processing frameworks; one or more processing engines including Join, Group, Filter, Aggregate, Project functional units and classifiers; and a client layer engine communicating with one or more big data applications, the client layer engine handling an output layer, an API layer, and an unified query layer.
    Type: Application
    Filed: April 4, 2016
    Publication date: October 13, 2016
    Inventors: Murugan Sankaradas, Giuseppe Coviello, Srimat Chakradhar, Marco Gianfico, Emanuel Di Nardo
  • Publication number: 20160269247
    Abstract: Aspects of the present disclosure are directed to techniques that improve performance of streaming systems. Accordingly we disclose efficient techniques for dynamic topology re-optimization, through the use of a feedback-driven control loop that substantially solve a number of these performance-impacting problems affecting such streaming systems. More particularly, we disclose a novel technique for network-aware tuple routing using consistent hashing that improves stream flow throughput in the presence of large, run-time overhead. We also disclose methods for dynamic optimization of overlay topologies for group communication operations. To enable fast topology re-optimization with least system disruption, we present a lightweight, fault-tolerant protocol. All of the disclosed techniques were implemented in a real system and comprehensively validated on three real applications. We have demonstrated significant improvement in performance (20% to 200%), while overcoming various compute and network bottlenecks.
    Type: Application
    Filed: March 14, 2016
    Publication date: September 15, 2016
    Inventors: Srimat Chakradhar, Naresh Rapolu
  • Publication number: 20160210723
    Abstract: Systems and methods are disclosed for speeding up a computer having a graphics processing unit (GPU) and a general purpose processor (GP-GPU) by decoupling a convolution process for a first matrix into a row part and a column part; expanding the row part into a second matrix; performing matrix multiplication using the second matrix and a filter matrix; and performing reduction on an output matrix.
    Type: Application
    Filed: November 19, 2015
    Publication date: July 21, 2016
    Inventors: Yi Yang, Srimat Chakradhar
  • Patent number: 9367346
    Abstract: Systems and methods for accelerating distributed transactions on key-value stores includes applying one or more policies of dynamic lock-localization, the policies including a lock migration stage that decreases nodes on which locks are present so that a transaction needs fewer number of network round trips to acquire locks, the policies including a lock ordering stage for pipelining during lock acquisition and wherein the order on locks to avoid deadlock is controlled by average contentions for the locks rather than static lexicographical ordering; and dynamically migrating and placing locks for distributed objects in distinct entity-groups in a datastore through the policies of dynamic lock-localization.
    Type: Grant
    Filed: January 24, 2014
    Date of Patent: June 14, 2016
    Assignee: NEC Corporation
    Inventors: Srimat Chakradhar, Naresh Rapolu
  • Patent number: 9367357
    Abstract: Methods and systems for scheduling jobs to manycore nodes in a cluster include selecting a job to run according to the job's wait time and the job's expected execution time; sending job requirements to all nodes in a cluster, where each node includes a manycore processor; determining at each node whether said node has sufficient resources to ever satisfy the job requirements and, if no node has sufficient resources, deleting the job; creating a list of nodes that have sufficient free resources at a present time to satisfy the job requirements; and assigning the job to a node, based on a difference between an expected execution time and associated confidence value for each node and a hypothetical fastest execution time and associated hypothetical maximum confidence value.
    Type: Grant
    Filed: April 24, 2014
    Date of Patent: June 14, 2016
    Assignee: NEC Corporation
    Inventors: Srihari Cadambi, Kunal Rao, Srimat Chakradhar, Rajat Phull, Giuseppe Coviello, Murugan Sankaradass, Cheng-Hong Li
  • Patent number: 9335981
    Abstract: Methods are provided for source-to-source transformations for graph processing on many-core platforms. A method includes receiving a graph application including one graph, expressed by a graph application programming interface configured for defining and manipulating graphs. The method further includes transforming, by a source-to-source compiler, the graph application into a plurality of parallel code variants. Each of the plurality of parallel code variants is specifically configured for parallel execution by a target one of a plurality of different many-core processors. The method also includes selecting and tuning, by a runtime component, a particular one of the parallel code variants for the parallel execution responsive to graph application characteristics, graph data, and an underlying code execution platform of the plurality of different many-core processors.
    Type: Grant
    Filed: October 9, 2014
    Date of Patent: May 10, 2016
    Assignee: NEC Corporation
    Inventors: Srimat Chakradhar, Michela Becchi, Da Li
  • Publication number: 20160110134
    Abstract: A graph storage and processing system is provided. The system includes a scalable, distributed, fault-tolerant, in-memory graph storage device for storing base graph data representative of graphs. The system further includes a real-time, in memory graph storage device for storing update graph data representative of graph updates for the graphs with respect to a time threshold. The system also includes an in-memory graph sampler for sampling the base graph data to generate sampled portions of the graphs and for storing the sampled portions of the graph. The system additionally includes a query manager for providing a query interface between applications and the system and for forming graph data representative of a complete graph from at least the base graph data and the update graph data, if any. The system also includes a graph computer for processing the sampled portions using batch-type computations to generate approximate results for graph-based queries.
    Type: Application
    Filed: August 20, 2015
    Publication date: April 21, 2016
    Inventors: Kunal Rao, Giuseppe Coviello, Srimat Chakradhar, Souvik Bhattacherjee, Srihari Cadambi
  • Publication number: 20160110404
    Abstract: A method is provided for detecting abnormal changes in real-time in dynamic graphs. The method includes extracting, by a graph sampler, an active sampled graph from an underlying base graph. The method further includes merging, by a graph merger, the active sampled graph with graph updates within a predetermined recent time period to generate a merged graph. The method also includes computing, by a graph diameter computer, a diameter of the merged graph. The method additionally includes determining, by a graph diameter change determination device, whether a graph diameter change exists. The method further includes generating, by an alarm generator, a user-perceptible alarm responsive to the graph diameter change.
    Type: Application
    Filed: August 20, 2015
    Publication date: April 21, 2016
    Inventors: Kunal Rao, Giuseppe Coviello, Srimat Chakradhar, Souvik Bhattacherjee, Srihari Cadambi
  • Publication number: 20160110409
    Abstract: A method in a graph storage and processing system is provided. The method includes storing, in a scalable, distributed, fault-tolerant, in-memory graph storage device, base graph data representative of graphs, and storing, in a real-time, in memory graph storage device, update graph data representative of graph updates for the graphs with respect to a time threshold. The method further includes sampling the base graph data to generate sampled portions of the graphs and storing the sampled portions, by an in-memory graph sampler. The method additionally includes providing, by a query manager, a query interface between applications and the system. The method also includes forming, by the query manager, graph data representative of a complete graph from at least the base graph data and the update graph data, if any. The method includes processing, by a graph computer, the sampled portions using batch-type computations to generate approximate results for graph-based queries.
    Type: Application
    Filed: August 20, 2015
    Publication date: April 21, 2016
    Inventors: Kunal Rao, Giuseppe Coviello, Srimat Chakradhar, Souvik Bhattacherjee, Srihari Cadambi
  • Publication number: 20150277877
    Abstract: Systems and methods for source-to-source transformation for compiler optimization for many integrated core (MIC) coprocessors, including identifying data dependencies in candidate loops and data elements used in each iteration for arrays, profiling candidate loops to find a proper number m, wherein data transfer and computation for m iterations take an equal amount of time, and creating an outer loop outside the candidate loop, with each iteration of the outer loop executing m iterations of the candidate loop. Data streaming is performed by determining optimum buffer size for one or more arrays and inserting code before the outer loop to create optimum sized buffers, overlapping data transfer between central processing units (CPUs) and MICs with the computation; reusing buffers to reduce memory employed on the MICs, and reusing threads on MICs to repeatedly launch kernels on the MICs for asynchronous data transfer.
    Type: Application
    Filed: March 25, 2015
    Publication date: October 1, 2015
    Inventors: Min Feng, Srimat Chakradhar, Linhai Song