Patents by Inventor Sunil K. Shukla
Sunil K. Shukla has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20170123794Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) and iteration level commits (ILC) in a course grained reconfigurable architecture (CGRA). The apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. The processing elements, LSU and control unit are configured to commit instructions, and save and restore context at loop iteration boundaries. In doing so, the apparatus tracks and buffers state of in-flight iterations, and detects conditions that prevents an iteration from completion.Type: ApplicationFiled: November 4, 2015Publication date: May 4, 2017Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Lee M. Saltzman, Sunil K. Shukla, Vijayalakshmi Srinivasan
-
Publication number: 20170123795Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.Type: ApplicationFiled: November 4, 2015Publication date: May 4, 2017Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
-
Patent number: 9632928Abstract: Embodiments of the invention provide a method and system for dynamic memory management implemented in hardware. In an embodiment, the method comprises storing objects in a plurality of heaps, and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heaps and marks selected objects, uses the marks to identify a plurality of the objects, and frees the identified objects. In an embodiment, the method comprises storing objects in a heap, each of at least some of the objects including a multitude of pointers; and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heap, using the pointers of some of the objects to identify others of the objects; processes the objects to mark selected objects; and uses the marks to identify a group of the objects, and frees the identified objects.Type: GrantFiled: April 28, 2016Date of Patent: April 25, 2017Assignee: International Business Machines CorporationInventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Publication number: 20160239414Abstract: Embodiments of the invention provide a method and system for dynamic memory management implemented in hardware. In an embodiment, the method comprises storing objects in a plurality of heaps, and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heaps and marks selected objects, uses the marks to identify a plurality of the objects, and frees the identified objects. In an embodiment, the method comprises storing objects in a heap, each of at least some of the objects including a multitude of pointers; and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heap, using the pointers of some of the objects to identify others of the objects; processes the objects to mark selected objects; and uses the marks to identify a group of the objects, and frees the identified objects.Type: ApplicationFiled: April 28, 2016Publication date: August 18, 2016Inventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Patent number: 9418187Abstract: As described herein, a tool records a log (or trace) of all sources of non-determinism in the system. In most of the cases, it's enough to log all transitions and the exact timestamps at all the entry and exit points of the system. By using this information it is possible to recreate a cycle accurate execution of the hardware system in simulation. Unlike CHIPSCOPE and SIGNALTAP which let you monitor a small number of signals in the design, the tool provides visibility into the whole system.Type: GrantFiled: November 13, 2015Date of Patent: August 16, 2016Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Daniel Foisy, Sunil K. Shukla
-
Patent number: 9355030Abstract: Embodiments of the invention provide a method and system for dynamic memory management implemented in hardware. In an embodiment, the method comprises storing objects in a plurality of heaps, and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heaps and marks selected objects, uses the marks to identify a plurality of the objects, and frees the identified objects. In an embodiment, the method comprises storing objects in a heap, each of at least some of the objects including a multitude of pointers; and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heap, using the pointers of some of the objects to identify others of the objects; processes the objects to mark selected objects; and uses the marks to identify a group of the objects, and frees the identified objects.Type: GrantFiled: June 6, 2014Date of Patent: May 31, 2016Assignee: International Business Machines CorporationInventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Patent number: 9329843Abstract: A communication stack for software-hardware co-execution on heterogeneous computing systems with processors and reconfigurable logic, in one aspect, may comprise a crossbar operable to connect hardware user code and functioning as a platform independent communication layer. A physical interface interfaces to the reconfigurable logic. A physical interface bridge is connected to the cross and the physical interface. The physical interface bridge connects the crossbar and the physical interface via a platform specific translation layer specific to the reconfigurable logic. The crossbar, the physical interface, and the physical interface bridge may be instantiated in response to the hardware user code being generated, the crossbar instantiated with associated parameters comprising one or more routes and associated data widths. The hardware user code is assigned a unique virtual route in the crossbar.Type: GrantFiled: June 10, 2013Date of Patent: May 3, 2016Assignee: International Business Machines CorporationInventors: Perry S. Cheng, Rodric Rabbah, Sunil K. Shukla
-
Patent number: 9323506Abstract: A communication stack for software-hardware co-execution on heterogeneous computing systems with processors and reconfigurable logic, in one aspect, may comprise a crossbar operable to connect hardware user code and functioning as a platform independent communication layer. A physical interface interfaces to the reconfigurable logic. A physical interface bridge is connected to the cross and the physical interface. The physical interface bridge connects the crossbar and the physical interface via a platform specific translation layer specific to the reconfigurable logic. The crossbar, the physical interface, and the physical interface bridge may be instantiated in response to the hardware user code being generated, the crossbar instantiated with associated parameters comprising one or more routes and associated data widths. The hardware user code is assigned a unique virtual route in the crossbar.Type: GrantFiled: August 5, 2013Date of Patent: April 26, 2016Assignee: International Business Machines CorporationInventors: Perry S. Cheng, Rodric Rabbah, Sunil K. Shukla
-
Publication number: 20160070835Abstract: As described herein, a tool records a log (or trace) of all sources of non-determinism in the system. In most of the cases, it's enough to log all transitions and the exact timestamps at all the entry and exit points of the system. By using this information it is possible to recreate a cycle accurate execution of the hardware system in simulation. Unlike CHIPSCOPE and SIGNALTAP which let you monitor a small number of signals in the design, the tool provides visibility into the whole system.Type: ApplicationFiled: November 13, 2015Publication date: March 10, 2016Inventors: Daniel FOISY, Sunil K. SHUKLA
-
Patent number: 9217774Abstract: As described herein, a tool records a log (or trace) of all sources of non-determinism in the system. In most of the cases, it's enough to log all transitions and the exact timestamps at all the entry and exit points of the system. By using this information it is possible to recreate a cycle accurate execution of the hardware system in simulation. Unlike CHIPSCOPE and SIGNALTAP which let you monitor a small number of signals in the design, the tool provides visibility into the whole system.Type: GrantFiled: August 29, 2014Date of Patent: December 22, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Daniel Foisy, Sunil K. Shukla
-
Publication number: 20150356007Abstract: Embodiments of the invention provide a method and system for dynamic memory management implemented in hardware. In an embodiment, the method comprises storing objects in a plurality of heaps, and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heaps and marks selected objects, uses the marks to identify a plurality of the objects, and frees the identified objects. In an embodiment, the method comprises storing objects in a heap, each of at least some of the objects including a multitude of pointers; and operating a hardware garbage collector to free heap space. The hardware garbage collector traverses the heap, using the pointers of some of the objects to identify others of the objects; processes the objects to mark selected objects; and uses the marks to identify a group of the objects, and frees the identified objects.Type: ApplicationFiled: June 6, 2014Publication date: December 10, 2015Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Publication number: 20150128100Abstract: As described herein, a tool records a log (or trace) of all sources of non-determinism in the system. In most of the cases, it's enough to log all transitions and the exact timestamps at all the entry and exit points of the system. By using this information it is possible to recreate a cycle accurate execution of the hardware system in simulation. Unlike CHIPSCOPE and SIGNALTAP which let you monitor a small number of signals in the design, the tool provides visibility into the whole system.Type: ApplicationFiled: August 29, 2014Publication date: May 7, 2015Inventors: Daniel Foisy, Sunil K. Shukla
-
Patent number: 8856491Abstract: A computing device is provided and includes a memory module, a sweep engine, a root snapshot module, and a trace engine. The memory module has a memory implemented as at least one hardware circuit. The memory module uses a dual-ported memory configuration. The sweep engine includes a stack pointer. The sweep engine is configured to send a garbage collection signal if the stack pointer falls below a specified level. The sweep engine is in communication with the memory module to reclaim memory. The root snapshot engine is configured to take a snapshot of roots from at least one mutator if the garbage collection signal is received from the sweep engine. The trace engine receives roots from the root snapshot engine and is in communication with the memory module to receive data.Type: GrantFiled: May 23, 2012Date of Patent: October 7, 2014Assignee: International Business Machines CorporationInventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Publication number: 20140208299Abstract: A communication stack for software-hardware co-execution on heterogeneous computing systems with processors and reconfigurable logic, in one aspect, may comprise a crossbar operable to connect hardware user code and functioning as a platform independent communication layer. A physical interface interfaces to the reconfigurable logic. A physical interface bridge is connected to the cross and the physical interface. The physical interface bridge connects the crossbar and the physical interface via a platform specific translation layer specific to the reconfigurable logic. The crossbar, the physical interface, and the physical interface bridge may be instantiated in response to the hardware user code being generated, the crossbar instantiated with associated parameters comprising one or more routes and associated data widths. The hardware user code is assigned a unique virtual route in the crossbar.Type: ApplicationFiled: June 10, 2013Publication date: July 24, 2014Inventors: Perry S. Cheng, Rodric Rabbah, Sunil K. Shukla
-
Publication number: 20140208300Abstract: A communication stack for software-hardware co-execution on heterogeneous computing systems with processors and reconfigurable logic, in one aspect, may comprise a crossbar operable to connect hardware user code and functioning as a platform independent communication layer. A physical interface interfaces to the reconfigurable logic. A physical interface bridge is connected to the cross and the physical interface. The physical interface bridge connects the crossbar and the physical interface via a platform specific translation layer specific to the reconfigurable logic. The crossbar, the physical interface, and the physical interface bridge may be instantiated in response to the hardware user code being generated, the crossbar instantiated with associated parameters comprising one or more routes and associated data widths. The hardware user code is assigned a unique virtual route in the crossbar.Type: ApplicationFiled: August 5, 2013Publication date: July 24, 2014Applicant: International Business Machines CorporationInventors: Perry S. Cheng, Rodric Rabbah, Sunil K. Shukla
-
Publication number: 20130346930Abstract: Searching for desired clock frequency for integrated circuit-based design may receive timing result of a hardware synthesis job executed based on a code specifying hardware design. One or more different timing constraints specifying respective one or more different clock frequencies than used in the hardware synthesis job may be automatically generated without modifying the code. One or more instances of the hardware synthesis job to run with the respective one or more different timing constraints may be automatically spawned. The automatic generation and spawning may repeat until a termination criterion is met, and/or a desired successful timing constraint is identified for the hardware design from the different timing constraints based on whether the one or more instances of the hardware synthesis job met their respective timing constraints.Type: ApplicationFiled: August 30, 2013Publication date: December 26, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Perry S. Cheng, Rodric Rabbah, Sunil K. Shukla
-
Publication number: 20130318290Abstract: A computing device is provided and includes a memory module, a sweep engine, a root snapshot module, and a trace engine. The memory module has a memory implemented as at least one hardware circuit. The memory module uses a dual-ported memory configuration. The sweep engine includes a stack pointer. The sweep engine is configured to send a garbage collection signal if the stack pointer falls below a specified level. The sweep engine is in communication with the memory module to reclaim memory. The root snapshot engine is configured to take a snapshot of roots from at least one mutator if the garbage collection signal is received from the sweep engine. The trace engine receives roots from the root snapshot engine and is in communication with the memory module to receive data.Type: ApplicationFiled: May 23, 2012Publication date: November 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Publication number: 20130318315Abstract: A method of garbage collection in a computing device is provided. The method includes providing a memory module having a memory implemented as at least one hardware circuit. The memory module uses a dual-ported memory configuration. The method includes triggering a garbage collection signal by a sweep engine of the computing device. The sweep engine is in communication with a memory module to reclaim memory. The method includes receiving the garbage collection signal by a root snapshot engine of the computing device. The method includes taking a snapshot of roots from at least one mutator by the root snapshot engine if the garbage collection signal is received. The method includes receiving roots from the root snapshot engine by a trace engine of the computing device. The trace engine is in communication with the memory module to receive data.Type: ApplicationFiled: June 19, 2012Publication date: November 28, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: David F. Bacon, Perry S. Cheng, Sunil K. Shukla
-
Patent number: 8566768Abstract: Searching for desired clock frequency for integrated circuit-based design may receive timing result of a hardware synthesis job executed based on a code specifying hardware design. One or more different timing constraints specifying respective one or more different clock frequencies than used in the hardware synthesis job may be automatically generated without modifying the code. One or more instances of the hardware synthesis job to run with the respective one or more different timing constraints may be automatically spawned. The automatic generation and spawning may repeat until a termination criterion is met, and/or a desired successful timing constraint is identified for the hardware design from the different timing constraints based on whether the one or more instances of the hardware synthesis job met their respective timing constraints.Type: GrantFiled: April 6, 2012Date of Patent: October 22, 2013Assignee: International Business Machines CorporationInventors: Sunil K. Shukla, Perry S. Cheng, Rodric Rabbah
-
Publication number: 20130268907Abstract: Searching for desired clock frequency for integrated circuit-based design may receive timing result of a hardware synthesis job executed based on a code specifying hardware design. One or more different timing constraints specifying respective one or more different clock frequencies than used in the hardware synthesis job may be automatically generated without modifying the code. One or more instances of the hardware synthesis job to run with the respective one or more different timing constraints may be automatically spawned. The automatic generation and spawning may repeat until a termination criterion is met, and/or a desired successful timing constraint is identified for the hardware design from the different timing constraints based on whether the one or more instances of the hardware synthesis job met their respective timing constraints.Type: ApplicationFiled: April 6, 2012Publication date: October 10, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sunil K. Shukla, Perry S. Cheng, Rodric Rabbah