Patents by Inventor Amit Ramchandran

Amit Ramchandran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7340562
    Abstract: A distributed data cache includes a number of cache memory units or register files each having a number of cache lines. Data buses are connected with the cache memory units. Each data bus is connected with a different cache line from each cache memory unit. A number of data address generators are connected with a memory unit and the data buses. The data address generators retrieve data values from the memory unit and communicate the data values to the data buses without latency. The data address generators are adapted to simultaneously communicate each of the data values to a different data bus without latency. The cache memory units are adapted to simultaneously load data values from the data buses, with each data value loaded into a different cache line without latency.
    Type: Grant
    Filed: July 24, 2003
    Date of Patent: March 4, 2008
    Assignee: NVIDIA Corporation
    Inventor: Amit Ramchandran
  • Publication number: 20070294511
    Abstract: One embodiment of the present includes a heterogenous, high-performance, scalable processor having at least one W-type sub-processor capable of processing W bits in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value wherein and smaller than W by a factor of two. The processor further includes a shared bus coupling the at least one W-type sub-processor and at least one N-type sub-processor and memory shared coupled to the at least one W-type sub-processor and the at least one N-type sub-processor, wherein the W-type sub-processor rearranges memory to accommodate execution of applications allowing for fast operations.
    Type: Application
    Filed: August 30, 2007
    Publication date: December 20, 2007
    Applicant: 3PLUS1 TECHNOLOGY, INC.
    Inventors: Amit Ramchandran, John Hauser
  • Publication number: 20070271415
    Abstract: The present invention includes a adaptable high-performance node (RXN) with several features that enable it to provide high performance along with adaptability. A preferred embodiment of the RXN includes a run-time configurable data path and control path. The RXN supports multi-precision arithmetic including 8, 16, 24, and 32 bit codes. Data flow can be reconfigured to minimize register accesses for different operations. For example, multiply-accumulate operations can be performed with minimal, or no, register stores by reconfiguration of the data path. Predetermined kernels can be configured during a setup phase so that the RXN can efficiently execute, e.g., discrete cosine transform (DCT), fast-Fourier transform (FFT) and other operations. Other features are provided.
    Type: Application
    Filed: May 3, 2007
    Publication date: November 22, 2007
    Inventor: Amit Ramchandran
  • Publication number: 20070198901
    Abstract: Among the embodiments of the present invention, one of the embodiment thereof includes a heterogeneous, high-performance, scalable processor including at least one W-type sub-processor capable of processing W bits, or more, in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value wherein and smaller than W, a shared bus coupling the at least one W-type sub-processor and at least one N-type sub-processor; and at least one Galois Field (GF) MAC coupled to communicate with the W-type sub-processor and the N-type sub-processor, wherein the W-type sub-processor rearranges bytes in transit to or from memory to accommodate execution of applications allowing for fast operations.
    Type: Application
    Filed: April 10, 2007
    Publication date: August 23, 2007
    Inventors: Amit Ramchandran, John Hauser, Adam Lins
  • Patent number: 7249242
    Abstract: Input pipeline registers are provided at inputs to functional units and data paths in a adaptive computing machine. Input pipeline registers are used to hold last-accessed values and to immediately place commonly needed constant values, such as a zero or one, onto inputs and data lines. This approach can reduce the time to obtain data values and conserve power by avoiding slower and more complex memory or storage accesses. Another embodiment of the invention allows data values to be obtained earlier during pipelined execution of instructions. For example, in a three stage fetch-decode-execute type of reduced instruction set computer (RISC), a data value can be ready from a prior instruction at the decode or execute stage of a subsequent instruction.
    Type: Grant
    Filed: July 23, 2003
    Date of Patent: July 24, 2007
    Assignee: NVIDIA Corporation
    Inventor: Amit Ramchandran
  • Publication number: 20070150656
    Abstract: A method for compressing a set of instructions in an adaptive computing machine includes identifying frequently executed instructions, inserting an explicit caching instruction associating the identified instructions with an index value in the set of instructions before the identified instructions and replacing at least one instance of the frequently executed instructions subsequent to the explicit caching instruction with a compressed instruction referencing the index value. One or more instructions can be identified for compression, including groups of consecutive or non-consecutive instructions. The explicit caching instruction directs a node in an adaptive computing machine to store instructions in an instruction storage unit in association with an index value. Instructions stored in the storage unit are retrievable with reference to the index value.
    Type: Application
    Filed: March 7, 2007
    Publication date: June 28, 2007
    Applicant: QuickSilver Technology, Inc.
    Inventor: Amit Ramchandran
  • Patent number: 7194605
    Abstract: A method for compressing a set of instructions in an adaptive computing machine includes identifying frequently executed instructions, inserting an explicit caching instruction associating the identified instructions with an index value in the set of instructions before the identified instructions and replacing at least one instance of the frequently executed instructions subsequent to the explicit caching instruction with a compressed instruction referencing the index value. One or more instructions can be identified for compression, including groups of consecutive or non-consecutive instructions. The explicit caching instruction directs a node in an adaptive computing machine to store instructions in an instruction storage unit in association with an index value. Instructions stored in the storage unit are retrievable with reference to the index value.
    Type: Grant
    Filed: July 24, 2003
    Date of Patent: March 20, 2007
    Assignee: NVIDIA Corporation
    Inventor: Amit Ramchandran
  • Publication number: 20060026578
    Abstract: One embodiment of the present includes a heterogenous, high-performance, scalable processor having at least one W-type sub-processor capable of processing W bits or greater in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value wherein and smaller than W. A scenario compiler is included in a hierarchical flow of compilation and used with other compilation and assembler blocks to generate binary code based on different types of codes to allow for efficient processing based on the sub-processors while maintaining low power consumption when the binary code is executed.
    Type: Application
    Filed: August 2, 2005
    Publication date: February 2, 2006
    Inventors: Amit Ramchandran, John Hauser
  • Publication number: 20060015703
    Abstract: One embodiment of the present includes a heterogenous, high-performance, scalable processor having at least one W-type sub-processor capable of processing W bits in parallel, W being an integer value, at least one N-type sub-processor capable of processing N bits in parallel, N being an integer value wherein and smaller than W by a factor of two. The processor further includes a shared bus coupling the at least one W-type sub-processor and at least one N-type sub-processor and memory shared coupled to the at least one W-type sub-processor and the at least one N-type sub-processor, wherein the W-type sub-processor rearranges memory to accommodate execution of applications allowing for fast operations.
    Type: Application
    Filed: July 12, 2005
    Publication date: January 19, 2006
    Inventors: Amit Ramchandran, John Hauser
  • Publication number: 20040168044
    Abstract: Input pipeline registers are provided at inputs to functional units and data paths in a adaptive computing machine. Input pipeline registers are used to hold last-accessed values and to immediately place commonly needed constant values, such as a zero or one, onto inputs and data lines. This approach can reduce the time to obtain data values and conserve power by avoiding slower and more complex memory or storage accesses. Another embodiment of the invention allows data values to be obtained earlier during pipelined execution of instructions. For example, in a three stage fetch-decode-execute type of reduced instruction set computer (RISC), a data value can be ready from a prior instruction at the decode or execute stage of a subsequent instruction.
    Type: Application
    Filed: July 23, 2003
    Publication date: August 26, 2004
    Applicant: QuickSilver Technology, Inc.
    Inventor: Amit Ramchandran
  • Publication number: 20040133745
    Abstract: The present invention includes a adaptable high-performance node (RXN) with several features that enable it to provide high performance along with adaptability. A preferred embodiment of the RXN includes a run-time configurable data path and control path. The RXN supports multi-precision arithmetic including 8, 16, 24, and 32 bit codes. Data flow can be reconfigured to minimize register accesses for different operations. For example, multiply-accumulate operations can be performed with minimal, or no, register stores by reconfiguration of the data path. Predetermined kernels can be configured during a setup phase so that the RXN can efficiently execute, e.g., discrete cosine transform (DCT), fast-Fourier transform (FFT) and other operations. Other features are provided.
    Type: Application
    Filed: July 23, 2003
    Publication date: July 8, 2004
    Applicant: QuickSilver Technology, Inc.
    Inventor: Amit Ramchandran
  • Publication number: 20040093479
    Abstract: A method for compressing a set of instructions in an adaptive computing machine includes identifying frequently executed instructions, inserting an explicit caching instruction associating the identified instructions with an index value in the set of instructions before the identified instructions and replacing at least one instance of the frequently executed instructions subsequent to the explicit caching instruction with a compressed instruction referencing the index value. One or more instructions can be identified for compression, including groups of consecutive or non-consecutive instructions. The explicit caching instruction directs a node in an adaptive computing machine to store instructions in an instruction storage unit in association with an index value. Instructions stored in the storage unit are retrievable with reference to the index value.
    Type: Application
    Filed: July 24, 2003
    Publication date: May 13, 2004
    Applicant: QuickSilver Technology, Inc.
    Inventor: Amit Ramchandran
  • Publication number: 20040093465
    Abstract: A distributed data cache includes a number of cache memory units or register files each having a number of cache lines. Data buses are connected with the cache memory units. Each data bus is connected with a different cache line from each cache memory unit. A number of data address generators are connected with a memory unit and the data buses. The data address generators retrieve data values from the memory unit and communicate the data values to the data buses without latency. The data address generators are adapted to simultaneously communicate each of the data values to a different data bus without latency. The cache memory units are adapted to simultaneously load data values from the data buses, with each data value loaded into a different cache line without latency.
    Type: Application
    Filed: July 24, 2003
    Publication date: May 13, 2004
    Applicant: QuickSilver Technology, Inc.
    Inventor: Amit Ramchandran