Patents by Inventor Srimat Chakradhar

Srimat Chakradhar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Cross-layer system architecture design

Patent number: 8762794

Abstract: Methods and systems for cross-layer forgiveness exploitation include executing one or more applications using a processing platform that includes a first reliable processing core and at least one additional processing core having a lower reliability than the first processing core, modifying application execution according to one or more best-effort techniques to improve performance, and controlling parameters associated with the processing platform and the best-effort layer that control performance and error rate such that performance is maximized in a region of low hardware-software interference.

Type: Grant

Filed: November 18, 2011

Date of Patent: June 24, 2014

Assignee: NEC Laboratories America, Inc.

Inventors: Srimat Chakradhar, Hyungmin Cho, Anand Raghunathan
AUTOMATIC ASYNCHRONOUS OFFLOAD FOR MANY-CORE COPROCESSORS

Publication number: 20140053131

Abstract: Methods and systems for asynchronous offload to many-core coprocessors include splitting a loop in an input source code into a sampling sub-part, a many integrated core (MIC) sub-part, and a central processing unit (CPU) sub-part; executing the sampling sub-part with a processor to determine loop characteristics including memory- and processor-operations executed by the loop; identifying optimal split boundaries based on the loop characteristics such that the MIC sub-part will complete in a same amount of time when executed on a MIC processor as the CPU sub-part will take when executed on a CPU; and modifying the input source code to split the loop at the identified boundaries, such that the MIC sub-part is executed on a MIC processor and the CPU sub-part is concurrently executed on a CPU.

Type: Application

Filed: July 12, 2013

Publication date: February 20, 2014

Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
Energy-aware task consolidation on graphics processing unit (GPU)

Patent number: 8643656

Abstract: A method includes configuring a shared library, stored in a memory, to be loaded into applications to intercept graphics processing unit (GPU) computation requests for different types of workload kernals corresponding to the applications. The method further includes generating a power prediction and a performance prediction for at least one candidate kernel combination for execution on a GPU responsive to the GPU computations requests. The at least one candidate kernel combination pertains to at least two of the workload kernals. The method also includes rendering a decision of whether to execute the at least one candidate kernel combination or to execute the at least two of the workload kernals pertaining thereto separately, based on the power prediction and the performance prediction.

Type: Grant

Filed: September 8, 2011

Date of Patent: February 4, 2014

Assignee: NEC Laboratories America, Inc.

Inventors: Dong Li, Surendra Byna, Srimat Chakradhar
Massively parallel processing core with plural chains of processing elements and respective smart memory storing select data received from each chain

Patent number: 8583896

Abstract: Systems and methods for massively parallel processing on an accelerator that includes a plurality of processing cores. Each processing core includes multiple processing chains configured to perform parallel computations, each of which includes a plurality of interconnected processing elements. The cores further include multiple of smart memory blocks configured to store and process data, each memory block accepting the output of one of the plurality of processing chains. The cores communicate with at least one off-chip memory bank.

Type: Grant

Filed: July 26, 2010

Date of Patent: November 12, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, Hans Peter Graf
Dynamically configurable, multi-ported co-processor for convolutional neural networks

Patent number: 8442927

Abstract: A coprocessor and method for processing convolutional neural networks includes a configurable input switch coupled to an input. A plurality of convolver elements are enabled in accordance with the input switch. An output switch is configured to receive outputs from the set of convolver elements to provide data to output branches. A controller is configured to provide control signals to the input switch and the output switch such that the set of convolver elements are rendered active and a number of output branches are selected for a given cycle in accordance with the control signals.

Type: Grant

Filed: February 1, 2010

Date of Patent: May 14, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Srimat Chakradhar, Murugan Sankaradas, Venkata S. Jakkula, Srihari Cadambi
OPTIMIZING COMPILER FOR IMPROVING APPLICATION PERFORMANCE ON MANY-CORE COPROCESSORS

Publication number: 20130055224

Abstract: A system and method for compiling includes parsing code of an application stored in a computer readable storage medium to identify one or more parallelizable code portions. At least one parallelizable code portion is optimized by transforming offload construct code portions to provide an optimized application.

Type: Application

Filed: August 24, 2012

Publication date: February 28, 2013

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
COMPILER FOR X86-BASED MANY-CORE COPROCESSORS

Publication number: 20130055225

Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.

Type: Application

Filed: August 24, 2012

Publication date: February 28, 2013

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
Data aware scheduling on heterogeneous platforms

Patent number: 8375392

Abstract: Systems and method for data-aware scheduling of applications on a heterogeneous platform having at least one central processing unit (CPU) and at least one accelerator. Such systems and methods include a function call handling module configured to intercept, analyze, and schedule library calls on a processing element. The function call handling module further includes a function call interception module configured to intercept function calls to predefined libraries, a function call analysis module configured to analyze argument size and location, and a function call redirection module configured to schedule library calls and data transfers. The systems and methods also use a memory unification module, configured to keep data coherent between memories associated with the at least one CPU and the at least one accelerator based on the output of the function call redirection module.

Type: Grant

Filed: August 20, 2010

Date of Patent: February 12, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Michela Becchi, Surendra Byna, Srihari Cadambi, Srimat Chakradhar
Systems and methods for implementing best-effort parallel computing frameworks

Patent number: 8286172

Abstract: Implementations of the present principles include Best-effort computing systems and methods. In accordance with various exemplary aspects of the present principles, a application computation requests directed to a processing platform may be intercepted and classified as either guaranteed computations or best-effort computations. Best-effort computations may be dropped to improve processing performance while minimally affecting the end result of application computations. In addition, interdependencies between best-effort computations may be relaxed to improve parallelism and processing speed while maintaining accuracy of computation results.

Type: Grant

Filed: March 6, 2009

Date of Patent: October 9, 2012

Assignee: NEC Laboratories America, Inc.

Inventors: Srimat Chakradhar, Anand Raghunathan, Jiayuan Meng
LOAD BALANCING ON HETEROGENEOUS PROCESSING CLUSTERS IMPLEMENTING PARALLEL EXECUTION

Publication number: 20120233486

Abstract: Methods and systems for managing data loads on a cluster of processors that implement an iterative procedure through parallel processing of data for the procedure are disclosed. One method includes monitoring, for at least one iteration of the procedure, completion times of a plurality of different processing phases that are undergone by each of the processors in a given iteration. The method further includes determining whether a load imbalance factor threshold is exceeded in the given iteration based on the completion times for the given iteration. In addition, the data is repartitioned by reassigning the data to the processors based on predicted dependencies between assigned data units of the data and completion times of a plurality of the processers for at least two of the phases. Further, the parallel processing is implemented on the cluster of processors in accordance with the reassignment.

Type: Application

Filed: March 1, 2012

Publication date: September 13, 2012

Applicant: NEC Laboratories America, Inc.

Inventors: Rajat Phull, Srihari Cadambi, Nishkam Ravi, Srimat Chakradhar
CROSS-LAYER SYSTEM ARCHITECTURE DESIGN

Publication number: 20120131389

Abstract: Methods and systems for cross-layer forgiveness exploitation include executing one or more applications using a processing platform that includes a first reliable processing core and at least one additional processing core having a lower reliability than the first processing core, modifying application execution according to one or more best-effort techniques to improve performance, and controlling parameters associated with the processing platform and the best-effort layer that control performance and error rate such that performance is maximized in a region of low hardware-software interference.

Type: Application

Filed: November 18, 2011

Publication date: May 24, 2012

Applicant: NEC Laboratories America, Inc.

Inventors: Srimat Chakradhar, Hyungmin Cho, Anand Raghunathan
SCHEDULER AND RESOURCE MANAGER FOR COPROCESSOR-BASED HETEROGENEOUS CLUSTERS

Publication number: 20120124591

Abstract: A system and method for scheduling client-server applications onto heterogeneous clusters includes storing at least one client request of at least one application in a pending request list on a computer readable storage medium. A priority metric is computed for each application, where the computed priority metric is applied to each client request belonging to that application. The priority metric is determined based on estimated performance of the client request and load on the pending request list. The at least one client request of the at least one application is scheduled based on the priority metric onto one or more heterogeneous resources.

Type: Application

Filed: October 13, 2011

Publication date: May 17, 2012

Applicant: NEC Laboratories America, Inc.

Inventors: Srihari Cadambi, Srimat Chakradhar, M. Mustafa Rafique
PARTITIONED ITERATIVE CONVERGANCE PROGRAMMING MODEL

Publication number: 20120084747

Abstract: Methods and systems for iterative convergence include performing at least one global iteration. Each global iteration includes partitioning input data into multiple input data partitions according to an input data partitioning function, partitioning a model into multiple model partitions according to a model partitioning function, performing at least one local iteration using a processor to compute sub-problems formed from a model partition and an input data partition to produce multiple locally updated models, and combining the locally updated models from the at least one local iteration according to a model merging function to produce a merged model.

Type: Application

Filed: September 19, 2011

Publication date: April 5, 2012

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: Srimat Chakradhar, Reza Farivar, Anand Raghunathan
ENERGY-AWARE TASK CONSOLIDATION ON GRAPHICS PROCESSING UNIT (GPU)

Publication number: 20120081373

Abstract: A method includes configuring a shared library, stored in a memory, to be loaded into applications to intercept graphics processing unit (GPU) computation requests for different types of workload kernals corresponding to the applications. The method further includes generating a power prediction and a performance prediction for at least one candidate kernel combination for execution on a GPU responsive to the GPU computations requests. The at least one candidate kernel combination pertains to at least two of the workload kernals. The method also includes rendering a decision of whether to execute the at least one candidate kernel combination or to execute the at least two of the workload kernals pertaining thereto separately, based on the power prediction and the performance prediction.

Type: Application

Filed: September 8, 2011

Publication date: April 5, 2012

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: DONG LI, SURENDRA BYNA, SRIMAT CHAKRADHAR
ENERGY EFFICIENT HETEROGENEOUS SYSTEMS

Publication number: 20120079298

Abstract: Low-power systems and methods are disclosed for executing an application software on a general purpose processor and a plurality of accelerators with a runtime controller. The runtime controller splits a workload across the processor and the accelerators to minimize energy. The system includes building one or more performance models in an application-agnostic manner; and monitoring system performance in real-time and adjusting the workload splitting to minimize energy while conforming to a target quality of service (QoS).

Type: Application

Filed: April 4, 2011

Publication date: March 29, 2012

Applicant: NEC LABORATORIES AMERICA, INC.

Inventors: Abhinandan Majumdar, Srihari Cadambi, Srimat Chakradhar
DATA AWARE SCHEDULING ON HETEROGENEOUS PLATFORMS

Publication number: 20110173155

Abstract: Systems and method for data-aware scheduling of applications on a heterogeneous platform having at least one central processing unit (CPU) and at least one accelerator. Such systems and methods include a function call handling module configured to intercept, analyze, and schedule library calls on a processing element. The function call handling module further includes a function call interception module configured to intercept function calls to predefined libraries, a function call analysis module configured to analyze argument size and location, and a function call redirection module configured to schedule library calls and data transfers. The systems and methods also use a memory unification module, configured to keep data coherent between memories associated with the at least one CPU and the at least one accelerator based on the output of the function call redirection module.

Type: Application

Filed: August 20, 2010

Publication date: July 14, 2011

Applicant: NEC Laboratories America, Inc.

Inventors: Michela Becchi, Surendra Byna, Srihari Cadambi, Srimat Chakradhar
MASSIVELY PARALLEL, SMART MEMORY BASED ACCELERATOR

Publication number: 20110119467

Abstract: Systems and methods for massively parallel processing on an accelerator that includes a plurality of processing cores. Each processing core includes multiple processing chains configured to perform parallel computations, each of which includes a plurality of interconnected processing elements. The cores further include multiple of smart memory blocks configured to store and process data, each memory block accepting the output of one of the plurality of processing chains. The cores communicate with at least one off-chip memory bank.

Type: Application

Filed: July 26, 2010

Publication date: May 19, 2011

Applicant: NEC Laboratories America, Inc.

Inventors: Srihari Cadambi, Abhinandan Majumdar, Michela Becchi, Srimat Chakradhar, Hans Peter Graf
Visibility and control of wireless sensor networks

Patent number: 7921206

Abstract: A computer implemented technique framework, prototype tool and associated methods that provide a high degree of visibility and control over the in-field execution of software in a minimally intrusive manner wherein developer-defined correctness tests and validation logic are embedded into the sensor node itself, making in-field software testing autonomous without necessitating continuous developer participation.

Type: Grant

Filed: April 18, 2008

Date of Patent: April 5, 2011

Assignee: NEC Laboratories America, Inc.

Inventors: Kiran Nagaraja, Vijay Raghunathan, Florin Sultan, Srimat Chakradhar, Nupur Kothari
DYNAMICALLY CONFIGURABLE, MULTI-PORTED CO-PROCESSOR FOR CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20110029471

Abstract: A coprocessor and method for processing convolutional neural networks includes a configurable input switch coupled to an input. A plurality of convolver elements are enabled in accordance with the input switch. An output switch is configured to receive outputs from the set of convolver elements to provide data to output branches. A controller is configured to provide control signals to the input switch and the output switch such that the set of convolver elements are rendered active and a number of output branches are selected for a given cycle in accordance with the control signals.

Type: Application

Filed: February 1, 2010

Publication date: February 3, 2011

Applicant: NEC Laboratories America, Inc.

Inventors: SRIMAT CHAKRADHAR, Murugan Sankaradas, Venkata S. Jakkula, Srihari Cadambi
SYSTEMS AND METHODS FOR IMPLEMENTING BEST-EFFORT PARALLEL COMPUTING FRAMEWORKS

Publication number: 20100088492

Abstract: Implementations of the present principles include Best-effort computing systems and methods. In accordance with various exemplary aspects of the present principles, a application computation requests directed to a processing platform may be intercepted and classified as either guaranteed computations or best-effort computations. Best-effort computations may be dropped to improve processing performance while minimally affecting the end result of application computations. In addition, interdependencies between best-effort computations may be relaxed to improve parallelism and processing speed while maintaining accuracy of computation results.

Type: Application

Filed: March 6, 2009

Publication date: April 8, 2010

Applicant: NEC Laboratories America, Inc.

Inventors: Srimat Chakradhar, Anand Raghunathan, Jiayuan Meng

prev … 3 4 5 6 7 8 9 10 next