Patents by Inventor Ravi P. Singh

Ravi P. Singh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

USING A HARDWARE SEQUENCER IN A DIRECT MEMORY ACCESS SYSTEM OF A SYSTEM ON A CHIP

Publication number: 20250103529

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: December 5, 2024

Publication date: March 27, 2025

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P. Singh, Ching-Yu Hung
Using a hardware sequencer in a direct memory access system of a system on a chip

Patent number: 12204475

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: December 9, 2022

Date of Patent: January 21, 2025

Assignee: NVIDIA Corporation

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
Performing load and permute with a single instruction in a system on a chip

Patent number: 12118353

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: October 15, 2024

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Performing load and store operations of 2D arrays in a single cycle in a system on a chip

Patent number: 12099439

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: September 24, 2024

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Using per memory bank load caches for reducing power use in a system on a chip

Patent number: 12093539

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: December 21, 2022

Date of Patent: September 17, 2024

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Built-in self-test for a programmable vision accelerator of a system on a chip

Patent number: 12050548

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: December 20, 2022

Date of Patent: July 30, 2024

Assignee: NVIDIA Corporation

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
USING A VECTOR PROCESSOR TO CONFIGURE A DIRECT MEMORY ACCESS SYSTEM FOR FEATURE TRACKING OPERATIONS IN A SYSTEM ON A CHIP

Publication number: 20240134645

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: January 2, 2024

Publication date: April 25, 2024

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
Reduced memory write requirements in a system on a chip using automatic store predication

Patent number: 11954496

Abstract: In various examples, systems and methods for reducing written requirements in a system on chip (SoC) are described herein. For instance, a total number of iterations may be determined for processing data, such as data representing an array. In some circumstances, a set of iterations may include a first number of iterations that is less than a second number of iterations. As such, and during execution of the set of iterations, a predicate flag corresponding to an excess iteration of the set of iterations may be generated, where the excess iteration corresponds to an iteration that is part of a number of excess iterations that is associated with a difference between the first number of iterations and the second number of iterations. Based on the predicate flag, one or more first values corresponding to the iteration may be prevented from being written to memory.

Type: Grant

Filed: August 2, 2021

Date of Patent: April 9, 2024

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Hardware accelerated anomaly detection using a min/max collector in a system on a chip

Patent number: 11940947

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: January 6, 2023

Date of Patent: March 26, 2024

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Using a vector processor to configure a direct memory access system for feature tracking operations in a system on a chip

Patent number: 11934829

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: December 9, 2022

Date of Patent: March 19, 2024

Assignee: NVIDIA Corporation

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
ACCELERATING TABLE LOOKUPS USING A DECOUPLED LOOKUP TABLE ACCELERATOR IN A SYSTEM ON A CHIP

Publication number: 20240045722

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: October 17, 2023

Publication date: February 8, 2024

Inventors: Ravi P. Singh, Ching-Yu Hung, Jagadeesh Sankaran, Ahmad Itani, Yen-Te Shih
Accelerating table lookups using a decoupled lookup table accelerator in a system on a chip

Patent number: 11836527

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: December 5, 2023

Assignee: NVIDIA Corporation

Inventors: Ravi P Singh, Ching-Yu Hung, Jagadeesh Sankaran, Ahmad Itani, Yen-Te Shih
Performing multiple point table lookups in a single cycle in a system on chip

Patent number: 11704067

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: July 18, 2023

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Ahmad Itani, Yen-Te Shih
USING A VECTOR PROCESSOR TO CONFIGURE A DIRECT MEMORY ACCESS SYSTEM FOR FEATURE TRACKING OPERATIONS IN A SYSTEM ON A CHIP

Publication number: 20230185569

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: December 9, 2022

Publication date: June 15, 2023

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P. Singh, Ching-Yu Hung
HARDWARE ACCELERATED ANOMALY DETECTION IN A SYSTEM ON A CHIP

Publication number: 20230153266

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: January 6, 2023

Publication date: May 18, 2023

Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmed Itani
BUILT-IN SELF-TEST FOR A PROGRAMMABLE VISION ACCELERATOR OF A SYSTEM ON A CHIP

Publication number: 20230125397

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: December 20, 2022

Publication date: April 27, 2023

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P. Singh, Ching-Yu Hung
Hardware accelerated anomaly detection using a min/max collector in a system on a chip

Patent number: 11636063

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Grant

Filed: August 2, 2021

Date of Patent: April 25, 2023

Assignee: NVIDIA Corporation

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
USING PER MEMORY BANK LOAD CACHES FOR REDUCING POWER USE IN A SYSTEM ON A CHIP

Publication number: 20230124604

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: December 21, 2022

Publication date: April 20, 2023

Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
Programmable vision accelerator

Patent number: 11630800

Abstract: In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms.

Type: Grant

Filed: April 28, 2016

Date of Patent: April 18, 2023

Assignee: NVIDIA CORPORATION

Inventors: Ching Y. Hung, Jagadeesh Sankaran, Ravi P. Singh, Stanley Tzeng
USING A HARDWARE SEQUENCER IN A DIRECT MEMORY ACCESS SYSTEM OF A SYSTEM ON A CHIP

Publication number: 20230111014

Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.

Type: Application

Filed: December 9, 2022

Publication date: April 13, 2023

Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P. Singh, Ching-Yu Hung

1 2 3 4 next