Patents by Inventor Ravi P. Singh
Ravi P. Singh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230076599Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: March 9, 2023Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Patent number: 11593001Abstract: A VPU and associated components include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators are used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer is included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU executes a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: GrantFiled: August 2, 2021Date of Patent: February 28, 2023Assignee: NVIDIA CorporationInventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Patent number: 11593290Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: GrantFiled: August 2, 2021Date of Patent: February 28, 2023Assignee: NVIDIA CorporationInventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
-
Publication number: 20230047233Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Publication number: 20230046642Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ching-Yu Hung, Ravi P Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Publication number: 20230050062Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Publication number: 20230049442Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Publication number: 20230053042Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Ahmad Itani, Yen-Te Shih
-
Publication number: 20230050902Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ravi P Singh, Ching-Yu Hung, Jagadeesh Sankaran, Ahmad Itani, Yen-Te Shih
-
Publication number: 20230048836Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 16, 2023Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
-
Publication number: 20230042858Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 9, 2023Inventors: Ravi P. Singh, Ching-Yu Hung, Jagadeesh Sankaran, Ahmad Itani, Yen-Te Shih
-
Publication number: 20230042226Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 9, 2023Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P. Singh, Ching-Yu Hung
-
Publication number: 20230045443Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 9, 2023Inventors: Ching-Yu Hung, Ravi P. Singh, Jagadeesh Sankaran, Yen-Te Shih, Ahmad Itani
-
Publication number: 20230037738Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: ApplicationFiled: August 2, 2021Publication date: February 9, 2023Inventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P. Singh, Ching-Yu Hung
-
Patent number: 11573921Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: GrantFiled: August 2, 2021Date of Patent: February 7, 2023Assignee: NVIDIA CorporationInventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
-
Patent number: 11573795Abstract: In various examples, a VPU and associated components may be optimized to improve VPU performance and throughput. For example, the VPU may include a min/max collector, automatic store predication functionality, a SIMD data path organization that allows for inter-lane sharing, a transposed load/store with stride parameter functionality, a load with permute and zero insertion functionality, hardware, logic, and memory layout functionality to allow for two point and two by two point lookups, and per memory bank load caching capabilities. In addition, decoupled accelerators may be used to offload VPU processing tasks to increase throughput and performance, and a hardware sequencer may be included in a DMA system to reduce programming complexity of the VPU and the DMA system. The DMA and VPU may execute a VPU configuration mode that allows the VPU and DMA to operate without a processing controller for performing dynamic region based data movement operations.Type: GrantFiled: August 2, 2021Date of Patent: February 7, 2023Assignee: NVIDIA CorporationInventors: Ahmad Itani, Yen-Te Shih, Jagadeesh Sankaran, Ravi P Singh, Ching-Yu Hung
-
Publication number: 20160321074Abstract: In one embodiment of the present invention, a programmable vision accelerator enables applications to collapse multi-dimensional loops into one dimensional loops. In general, configurable components included in the programmable vision accelerator work together to facilitate such loop collapsing. The configurable elements include multi-dimensional address generators, vector units, and load/store units. Each multi-dimensional address generator generates a different address pattern. Each address pattern represents an overall addressing sequence associated with an object accessed within the collapsed loop. The vector units and the load store units provide execution functionality typically associated with multi-dimensional loops based on the address pattern. Advantageously, collapsing multi-dimensional loops in a flexible manner dramatically reduces the overhead associated with implementing a wide range of computer vision algorithms.Type: ApplicationFiled: April 28, 2016Publication date: November 3, 2016Inventors: Ching Y. HUNG, Jagadeesh SANKARAN, Ravi P. SINGH, Stanley TZENG
-
Patent number: 7472259Abstract: In one embodiment, a pipelined processor is described that includes an execution pipeline having a plurality of stages and a multi-cycle instruction (MCI) controller adapted to assert a stall signal to stall the multi-cycle instruction within one of the stages of the execution pipeline. The MCI controller is adapted to issue a plurality of instructions to subsequent stages in the pipeline while the multi-cycle instruction is stalled.Type: GrantFiled: December 6, 2000Date of Patent: December 30, 2008Assignee: Analog Devices, Inc.Inventors: Gregory A. Overkamp, Charles P. Roth, Ravi P. Singh
-
Patent number: 7366876Abstract: In one embodiment, a state machine receives a plurality of instructions from an instruction register to be processed by a digital signal processor. After receiving a single RTI, the state machine loads each of the plurality of instructions one at time and determines the validity of each instruction. If the instruction is valid, the state machine transfers the instruction to the decoder. If the instruction is invalid or if a no-operation instruction is present, the state machine discards the instruction and immediately loads the next instruction.Type: GrantFiled: October 31, 2000Date of Patent: April 29, 2008Assignee: Analog Devices, Inc.Inventors: Charles P. Roth, Ravi P Singh, Gregory A. Overkamp, Tien Dinh
-
Patent number: 7360059Abstract: In one embodiment, a digital signal processor includes look ahead logic to decrease the number of bubbles inserted in the processing pipeline. The processor receives data containing instructions in a plurality of buffers and decodes the size of a first instruction. The beginning of a second instruction is determined based on the size of the first instruction. The size of the second instruction is decoded and the processor determines whether loading the second instruction will deplete one of the plurality of buffers.Type: GrantFiled: February 3, 2006Date of Patent: April 15, 2008Assignee: Analog Devices, Inc.Inventors: Thomas Tomazin, William C. Anderson, Charles P. Roth, Kayla Chalmers, Juan G. Revilla, Ravi P. Singh