Patents by Inventor Shomit N. Das
Shomit N. Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20210091787Abstract: Entropy agnostic data encoding includes: receiving, by an encoder, input data including a bit string; generating a plurality of candidate codewords, including encoding the input data bit string with a plurality of binary vectors, wherein the plurality of binary vectors includes a set of deterministic biased binary vectors and a set of random binary vectors; selecting, in dependence upon a predefined criteria, one of the plurality of candidate codewords; and transmitting the selected candidate codeword to a decoder.Type: ApplicationFiled: September 23, 2019Publication date: March 25, 2021Inventors: SEYEDMOHAMMAD SEYEDZADEHDELCHEH, SHOMIT N. DAS
-
Publication number: 20210089324Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.Type: ApplicationFiled: June 26, 2020Publication date: March 25, 2021Inventors: Greg Sadowski, John Kalamatianos, Shomit N. Das
-
Patent number: 10944422Abstract: Entropy agnostic data encoding includes: receiving, by an encoder, input data including a bit string; generating a plurality of candidate codewords, including encoding the input data bit string with a plurality of binary vectors, wherein the plurality of binary vectors includes a set of deterministic biased binary vectors and a set of random binary vectors; selecting, in dependence upon a predefined criteria, one of the plurality of candidate codewords; and transmitting the selected candidate codeword to a decoder.Type: GrantFiled: September 23, 2019Date of Patent: March 9, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Seyedmohammad Seyedzadehdelcheh, Shomit N. Das
-
Patent number: 10944693Abstract: A system is described that includes an integrated circuit chip having a network-on-chip. The network-on-chip includes multiple routers arranged in a topology and a separate communication link coupled between each router and each of one or more neighboring routers of that router among the multiple routers in the topology. The integrated circuit chip also includes multiple nodes, each node coupled to a router of the multiple routers. When operating, a given router of the multiple routers keeps a record of operating states of some or all of the multiple routers and corresponding communication links. The given router then routes flits to destination nodes via one or more other routers of the multiple routers based at least in part on the operating states of the some or all of the multiple routers and the corresponding communication links.Type: GrantFiled: November 13, 2018Date of Patent: March 9, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Srikant Bharadwaj, Shomit N. Das
-
Patent number: 10860489Abstract: Techniques are disclosed for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.Type: GrantFiled: October 31, 2018Date of Patent: December 8, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
-
Patent number: 10838727Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.Type: GrantFiled: December 14, 2018Date of Patent: November 17, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
-
Patent number: 10795825Abstract: An electronic device includes at least one compression-decompression functional block and a hierarchy of cache memories with a first cache memory and a second cache memory. The at least one compression-decompression functional block receives data in an uncompressed state, compresses the data using one of a first compression or a second compression, and, after compressing the data, provides the data to the first cache memory for storage therein. When the data is retrieved from the first cache memory to be stored in the second cache memory, when the data is compressed using the first compression, the compression-decompression functional block decompresses the data to reverse effects of the first compression on the data, thereby restoring the data to the uncompressed state and provides the data compressed using the second compression or in the uncompressed state to the second cache memory for storage therein.Type: GrantFiled: December 26, 2018Date of Patent: October 6, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Matthew J. Tomei, Philip B. Bedoukian, Shomit N. Das
-
Publication number: 20200210343Abstract: An electronic device includes at least one compression-decompression functional block and a hierarchy of cache memories with a first cache memory and a second cache memory. The at least one compression-decompression functional block receives data in an uncompressed state, compresses the data using one of a first compression or a second compression, and, after compressing the data, provides the data to the first cache memory for storage therein. When the data is retrieved from the first cache memory to be stored in the second cache memory, when the data is compressed using the first compression, the compression-decompression functional block decompresses the data to reverse effects of the first compression on the data, thereby restoring the data to the uncompressed state and provides the data compressed using the second compression or in the uncompressed state to the second cache memory for storage therein.Type: ApplicationFiled: December 26, 2018Publication date: July 2, 2020Inventors: Matthew J. Tomei, Philip B. Bedoukian, Shomit N. Das
-
Patent number: 10698692Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.Type: GrantFiled: July 21, 2016Date of Patent: June 30, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Greg Sadowski, John Kalamatianos, Shomit N. Das
-
Publication number: 20200192705Abstract: In some examples, thermal aware optimization logic determines a characteristic (e.g., a workload or type) of a wavefront (e.g., multiple threads). For example, the characteristic indicates whether the wavefront is compute intensive, memory intensive, mixed, and/or another type of wavefront. The thermal aware optimization logic determines temperature information for one or more compute units (CUs) in one or more processing cores. The temperature information includes predictive thermal information indicating expected temperatures corresponding to the one or more CUs and historical thermal information indicating current or past thermal temperatures of at least a portion of a graphics processing unit (GPU). The logic selects the one or more compute units to process the plurality of threads based on the determined characteristic and the temperature information. The logic provides instructions to the selected subset of the plurality of CUs to execute the wavefront.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Inventors: KARTHIK RAO, SHOMIT N. DAS, XUDONG AN, WEI HUANG
-
Publication number: 20200192671Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
-
Publication number: 20200183597Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.Type: ApplicationFiled: December 6, 2018Publication date: June 11, 2020Inventors: Shomit N. DAS, Kishore PUNNIYAMURTHY
-
Publication number: 20200183485Abstract: A processing system dynamically scales at least one of voltage and frequency at a subset of a plurality of compute units of a graphics processing unit (GPU) based on characteristics of a kernel or workload to be executed at the subset. A system management unit for the processing system receives a compute unit mask, designating the subset of a plurality of compute units of a GPU to execute the kernel or workload, and workload characteristics indicating the compute-boundedness or memory bandwidth-boundedness of the kernel or workload from a central processing unit of the processing system. The system management unit determines a dynamic voltage and frequency scaling policy for the subset of the plurality of compute units of the GPU based on the compute unit mask and the workload characteristics.Type: ApplicationFiled: December 7, 2018Publication date: June 11, 2020Inventors: Shomit N. DAS, Joseph L. GREATHOUSE
-
Publication number: 20200153757Abstract: A system is described that includes an integrated circuit chip having a network-on-chip. The network-on-chip includes multiple routers arranged in a topology and a separate communication link coupled between each router and each of one or more neighboring routers of that router among the multiple routers in the topology. The integrated circuit chip also includes multiple nodes, each node coupled to a router of the multiple routers. When operating, a given router of the multiple routers keeps a record of operating states of some or all of the multiple routers and corresponding communication links. The given router then routes flits to destination nodes via one or more other routers of the multiple routers based at least in part on the operating states of the some or all of the multiple routers and the corresponding communication links.Type: ApplicationFiled: November 13, 2018Publication date: May 14, 2020Inventors: Srikant Bharadwaj, Shomit N. Das
-
Publication number: 20200151573Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.Type: ApplicationFiled: May 29, 2019Publication date: May 14, 2020Inventors: Shomit N. DAS, Abhinav VISHNU
-
Publication number: 20200133866Abstract: The disclosure herein provides techniques for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.Type: ApplicationFiled: October 31, 2018Publication date: April 30, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
-
Publication number: 20200104262Abstract: A processing device is provided which includes memory comprising data cache memory configured to store compressed data and metadata cache memory configured to store metadata, each portion of metadata comprising an encoding used to compress a portion of data. The processing device also includes at least one processor configured to compress portions of data and select, based on one or more utility level metrics, portions of metadata to be stored in the metadata cache memory. The at least one processor is also configured to store, in the metadata cache memory, the portions of metadata selected to be stored in the metadata cache memory, store, in the data cache memory, each portion of compressed data having a selected portion of corresponding metadata stored in the metadata cache memory. Each portion of compressed data, having the selected portion of corresponding metadata stored in the metadata cache memory, is decompressed.Type: ApplicationFiled: September 28, 2018Publication date: April 2, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
-
Publication number: 20200073845Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.Type: ApplicationFiled: August 30, 2018Publication date: March 5, 2020Inventors: Shomit N. Das, Matthew Tomei, Shrikanth Ganapathy, John Kalamatianos
-
Patent number: 10558606Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.Type: GrantFiled: August 30, 2018Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, Shrikanth Ganapathy, John Kalamatianos
-
Patent number: 10411731Abstract: A processing device is provided which includes a plurality of encoders each configured to compress a portion of data using a different compression algorithm. The processing device also includes one or more processors configured to cause an encoder, of the plurality of encoders, to compress the portion of data when it is determined that the portion of data, which is compressed by another encoder configured to compress the portion of data prior to the encoder in an encoder hierarchy, is not successfully compressed according to a compression metric by the other encoder in the encoder hierarchy. The one or more processors are also configured to prevent the encoder from compressing the portion of data when it is determined that the portion of data is successfully compressed according to the compression metric by the other encoder in the encoder hierarchy.Type: GrantFiled: September 24, 2018Date of Patent: September 10, 2019Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Shomit N. Das, Matthew Tomei