Patents by Inventor Gurushankar Rajamani
Gurushankar Rajamani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250045238Abstract: The present application relates to systems and methods for providing high bandwidth connections between memory and computing units. For example, memory units can be configured to perform concurrent read and write operations in parallel to one another. The memory units can also be configured to alter the relative bandwidths that are available for the read and write operations. For example, bi-directional transmission interfaces of a memory unit can be assigned to operate as uni-directional interfaces that are a part of either a data input path or a data output path.Type: ApplicationFiled: August 1, 2024Publication date: February 6, 2025Inventors: Horia Alexandru Toma, Gurushankar Rajamani, Sukalpa Biswas, Robert S. Sprinkle
-
Publication number: 20250028661Abstract: The disclosure provides for high bandwidth processing through the sharing of memory dies over a plurality of computing dies via an optical interchange. The optical interchange may be configured so as to operate as both an optical switch and optical demultiplexer. The optical switch configuration for the optical interchange allows for data to be written from any computing die to one of a plurality of memory dies via an optical connection. The optical demultiplexer configuration allows for data to be broadcast from a memory die to a plurality of the computing dies.Type: ApplicationFiled: July 18, 2023Publication date: January 23, 2025Inventors: Horia Alexandru Toma, Zuowei Shen, William F. Edwards, JR., Gurushankar Rajamani, Hong Liu, Ilyas Mohammed
-
Publication number: 20240394141Abstract: The mapping of system memory addresses to physical memory addresses is modeled as a two dimensional mapping array. Each element of the mapping array is assigned a system memory address and a physical memory address to which the system memory address is mapped. The mapping array is arranged to facilitate designation of a portion of the physical memory addresses as spareable physical memory addresses that are employed when there is a memory failure.Type: ApplicationFiled: May 26, 2023Publication date: November 28, 2024Inventors: Gurushankar Rajamani, Horia Alexandru Toma, Le Wang, Spoorthy Nanjaiah, Albert Forte Magyar, Xiaoming Wang
-
Patent number: 12132802Abstract: An application specific integrated circuit (ASIC) is provided for reliable transport of packets. The network interface card may include a reliable transport accelerator (RTA). The RTA may include a cache lookup database. The RTA may be configured to determine, from a received data packet, a connection identifier and query the cache lookup database for a cache entry corresponding to a connection context having the connection identifier. In response to the query, the RTA may receive a cache hit or a cache miss.Type: GrantFiled: December 16, 2021Date of Patent: October 29, 2024Assignee: Google LLCInventors: Weihuang Wang, Srinivas Vaduvatha, Xiaoming Wang, Gurushankar Rajamani, Abhishek Agarwal, Jiazhen Zheng, Prashant Chandra
-
Patent number: 11934826Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.Type: GrantFiled: November 19, 2021Date of Patent: March 19, 2024Assignee: Google LLCInventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
-
Patent number: 11928580Abstract: Methods, systems, and apparatus, including computer-readable media, are described for interleaving memory requests to accelerate memory accesses at a hardware circuit configured to implement a neural network model. A system generates multiple requests that are processed against a memory of the system. Each request is used to retrieve data from the memory. For each request, the system generates multiple sub-requests based on a respective size of the data to be retrieved using the request. The system generates a sequence of interleaved sub-requests that includes respective sub-requests of a first request interleaved among respective sub-requests of a second request. Based on the sequence of interleaved sub-requests, a module of the system receives respective portions of data accessed from different address locations of the memory. The system processes each of the respective portions of data to generate a neural network inference using the neural network model implemented at the hardware circuit.Type: GrantFiled: April 4, 2022Date of Patent: March 12, 2024Assignee: Google LLCInventors: Gurushankar Rajamani, Alice Kuo
-
Patent number: 11748028Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.Type: GrantFiled: November 23, 2022Date of Patent: September 5, 2023Assignee: Google LLCInventors: Amin Farmahini, Benjamin Steel Gelb, Gurushankar Rajamani, Sukalpa Biswas
-
Publication number: 20230062889Abstract: An application specific integrated circuit (ASIC) is provided for reliable transport of packets. The network interface card may include a reliable transport accelerator (RTA). The RTA may include a cache lookup database. The RTA may be configured to determine, from a received data packet, a connection identifier and query the cache lookup database for a cache entry corresponding to a connection context having the connection identifier. In response to the query, the RTA may receive a cache hit or a cache miss.Type: ApplicationFiled: December 16, 2021Publication date: March 2, 2023Inventors: Weihuang Wang, Srinivas Vaduvatha, Xiaoming Wang, Gurushankar Rajamani, Abhishek Agarwal, Jiazhen Zheng, Prashant Chandra
-
Patent number: 11513724Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.Type: GrantFiled: June 15, 2021Date of Patent: November 29, 2022Assignee: Google LLCInventors: Amin Farmahini, Benjamin Steel Gelb, Gurushankar Rajamani, Sukalpa Biswas
-
Publication number: 20220374692Abstract: Methods, systems, and apparatus, including computer-readable media, are described for interleaving memory requests to accelerate memory accesses at a hardware circuit configured to implement a neural network model. A system generates multiple requests that are processed against a memory of the system. Each request is used to retrieve data from the memory. For each request, the system generates multiple sub-requests based on a respective size of the data to be retrieved using the request. The system generates a sequence of interleaved sub-requests that includes respective sub-requests of a first request interleaved among respective sub-requests of a second request. Based on the sequence of interleaved sub-requests, a module of the system receives respective portions of data accessed from different address locations of the memory. The system processes each of the respective portions of data to generate a neural network inference using the neural network model implemented at the hardware circuit.Type: ApplicationFiled: April 4, 2022Publication date: November 24, 2022Inventors: Gurushankar Rajamani, Alice Kuo
-
Publication number: 20220156071Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.Type: ApplicationFiled: November 19, 2021Publication date: May 19, 2022Inventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
-
Patent number: 11295206Abstract: Methods, systems, and apparatus, including computer-readable media, are described for interleaving memory requests to accelerate memory accesses at a hardware circuit configured to implement a neural network model. A system generates multiple requests that are processed against a memory of the system. Each request is used to retrieve data from the memory. For each request, the system generates multiple sub-requests based on a respective size of the data to be retrieved using the request. The system generates a sequence of interleaved sub-requests that includes respective sub-requests of a first request interleaved among respective sub-requests of a second request. Based on the sequence of interleaved sub-requests, a module of the system receives respective portions of data accessed from different address locations of the memory. The system processes each of the respective portions of data to generate a neural network inference using the neural network model implemented at the hardware circuit.Type: GrantFiled: May 15, 2020Date of Patent: April 5, 2022Assignee: Google LLCInventors: Gurushankar Rajamani, Alice Kuo
-
Patent number: 11182159Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.Type: GrantFiled: August 31, 2020Date of Patent: November 23, 2021Assignee: Google LLCInventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
-
Publication number: 20210311658Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.Type: ApplicationFiled: June 15, 2021Publication date: October 7, 2021Inventors: Amin Farmahini, Benjamin Steel Gelb, Gurushankar Rajamani, Sukalpa Biswas
-
Patent number: 11137936Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.Type: GrantFiled: July 15, 2020Date of Patent: October 5, 2021Assignee: Google LLCInventors: Amin Farmahini, Benjamin Steel Gelb, Gurushankar Rajamani, Sukalpa Biswas
-
Publication number: 20210263739Abstract: Methods, systems, and apparatus, including computer-readable media, are described for performing vector reductions using a shared scratchpad memory of a hardware circuit having processor cores that communicate with the shared memory. For each of the processor cores, a respective vector of values is generated based on computations performed at the processor core. The shared memory receives the respective vectors of values from respective resources of the processor cores using a direct memory access (DMA) data path of the shared memory. The shared memory performs an accumulation operation on the respective vectors of values using an operator unit coupled to the shared memory. The operator unit is configured to accumulate values based on arithmetic operations encoded at the operator unit. A result vector is generated based on performing the accumulation operation using the respective vectors of values.Type: ApplicationFiled: August 31, 2020Publication date: August 26, 2021Inventors: Thomas Norrie, Gurushankar Rajamani, Andrew Everett Phelps, Matthew Leever Hedlund, Norman Paul Jouppi
-
Publication number: 20210248451Abstract: Methods, systems, and apparatus, including computer-readable media, are described for interleaving memory requests to accelerate memory accesses at a hardware circuit configured to implement a neural network model. A system generates multiple requests that are processed against a memory of the system. Each request is used to retrieve data from the memory. For each request, the system generates multiple sub-requests based on a respective size of the data to be retrieved using the request. The system generates a sequence of interleaved sub-requests that includes respective sub-requests of a first request interleaved among respective sub-requests of a second request. Based on the sequence of interleaved sub-requests, a module of the system receives respective portions of data accessed from different address locations of the memory. The system processes each of the respective portions of data to generate a neural network inference using the neural network model implemented at the hardware circuit.Type: ApplicationFiled: May 15, 2020Publication date: August 12, 2021Inventors: Gurushankar Rajamani, Alice Kuo
-
Publication number: 20210223985Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing data on a memory controller. One of the methods comprises obtaining a first request and a second request to access respective data corresponding to the first and second requests at a first memory device of the plurality of memory devices; and initiating interleaved processing of the respective data; receiving an indication to stop processing requests to access data at the first memory device and to initiate processing requests to access data at a second memory device, determining that the respective data corresponding to the first and second requests have not yet been fully processed at the time of receiving the indication, and in response, storing, in memory accessible to the memory controller, data corresponding to the requests which have not yet been fully processed.Type: ApplicationFiled: July 15, 2020Publication date: July 22, 2021Inventors: Amin Farmahini, Benjamin Steel Gelb, Gurushankar Rajamani, Sukalpa Biswas
-
Patent number: 9594713Abstract: Bridging strongly ordered write transactions to devices in weakly ordered domains, and related apparatuses, methods, and computer-readable media are disclosed. In one aspect, a host bridge device is configured to receive strongly ordered write transactions from one or more strongly ordered producer devices. The host bridge device issues the strongly ordered write transactions to one or more consumer devices within a weakly ordered domain. The host bridge device detects a first write transaction that is not accepted by a first consumer device of the one or more consumer devices. For each of one or more write transactions issued subsequent to the first write transaction and accepted by a respective consumer device, the host bridge device sends a cancellation message to the respective consumer device. The host bridge device replays the first write transaction and the one or more write transactions that were issued subsequent to the first write transaction.Type: GrantFiled: September 12, 2014Date of Patent: March 14, 2017Assignee: QUALCOMM IncorporatedInventors: Randall John Pascarella, Jaya Prakash Subramaniam Ganasan, Thuong Quang Truong, Gurushankar Rajamani, Joseph Gerald McDonald, Thomas Philip Speier
-
Publication number: 20160077991Abstract: Bridging strongly ordered write transactions to devices in weakly ordered domains, and related apparatuses, methods, and computer-readable media are disclosed. In one aspect, a host bridge device is configured to receive strongly ordered write transactions from one or more strongly ordered producer devices. The host bridge device issues the strongly ordered write transactions to one or more consumer devices within a weakly ordered domain. The host bridge device detects a first write transaction that is not accepted by a first consumer device of the one or more consumer devices. For each of one or more write transactions issued subsequent to the first write transaction and accepted by a respective consumer device, the host bridge device sends a cancellation message to the respective consumer device. The host bridge device replays the first write transaction and the one or more write transactions that were issued subsequent to the first write transaction.Type: ApplicationFiled: September 12, 2014Publication date: March 17, 2016Inventors: Randall John Pascarella, Jaya Prakash Subramaniam Ganasan, Thuong Quang Truong, Gurushankar Rajamani, Joseph Gerald McDonald, Thomas Philip Speier