Patents by Inventor Shomit N. Das

Shomit N. Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210091787
    Abstract: Entropy agnostic data encoding includes: receiving, by an encoder, input data including a bit string; generating a plurality of candidate codewords, including encoding the input data bit string with a plurality of binary vectors, wherein the plurality of binary vectors includes a set of deterministic biased binary vectors and a set of random binary vectors; selecting, in dependence upon a predefined criteria, one of the plurality of candidate codewords; and transmitting the selected candidate codeword to a decoder.
    Type: Application
    Filed: September 23, 2019
    Publication date: March 25, 2021
    Inventors: SEYEDMOHAMMAD SEYEDZADEHDELCHEH, SHOMIT N. DAS
  • Publication number: 20210089324
    Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.
    Type: Application
    Filed: June 26, 2020
    Publication date: March 25, 2021
    Inventors: Greg Sadowski, John Kalamatianos, Shomit N. Das
  • Patent number: 10944422
    Abstract: Entropy agnostic data encoding includes: receiving, by an encoder, input data including a bit string; generating a plurality of candidate codewords, including encoding the input data bit string with a plurality of binary vectors, wherein the plurality of binary vectors includes a set of deterministic biased binary vectors and a set of random binary vectors; selecting, in dependence upon a predefined criteria, one of the plurality of candidate codewords; and transmitting the selected candidate codeword to a decoder.
    Type: Grant
    Filed: September 23, 2019
    Date of Patent: March 9, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Seyedmohammad Seyedzadehdelcheh, Shomit N. Das
  • Patent number: 10944693
    Abstract: A system is described that includes an integrated circuit chip having a network-on-chip. The network-on-chip includes multiple routers arranged in a topology and a separate communication link coupled between each router and each of one or more neighboring routers of that router among the multiple routers in the topology. The integrated circuit chip also includes multiple nodes, each node coupled to a router of the multiple routers. When operating, a given router of the multiple routers keeps a record of operating states of some or all of the multiple routers and corresponding communication links. The given router then routes flits to destination nodes via one or more other routers of the multiple routers based at least in part on the operating states of the some or all of the multiple routers and the corresponding communication links.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: March 9, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Srikant Bharadwaj, Shomit N. Das
  • Patent number: 10860489
    Abstract: Techniques are disclosed for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: December 8, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
  • Patent number: 10838727
    Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: November 17, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
  • Patent number: 10795825
    Abstract: An electronic device includes at least one compression-decompression functional block and a hierarchy of cache memories with a first cache memory and a second cache memory. The at least one compression-decompression functional block receives data in an uncompressed state, compresses the data using one of a first compression or a second compression, and, after compressing the data, provides the data to the first cache memory for storage therein. When the data is retrieved from the first cache memory to be stored in the second cache memory, when the data is compressed using the first compression, the compression-decompression functional block decompresses the data to reverse effects of the first compression on the data, thereby restoring the data to the uncompressed state and provides the data compressed using the second compression or in the uncompressed state to the second cache memory for storage therein.
    Type: Grant
    Filed: December 26, 2018
    Date of Patent: October 6, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Matthew J. Tomei, Philip B. Bedoukian, Shomit N. Das
  • Publication number: 20200210343
    Abstract: An electronic device includes at least one compression-decompression functional block and a hierarchy of cache memories with a first cache memory and a second cache memory. The at least one compression-decompression functional block receives data in an uncompressed state, compresses the data using one of a first compression or a second compression, and, after compressing the data, provides the data to the first cache memory for storage therein. When the data is retrieved from the first cache memory to be stored in the second cache memory, when the data is compressed using the first compression, the compression-decompression functional block decompresses the data to reverse effects of the first compression on the data, thereby restoring the data to the uncompressed state and provides the data compressed using the second compression or in the uncompressed state to the second cache memory for storage therein.
    Type: Application
    Filed: December 26, 2018
    Publication date: July 2, 2020
    Inventors: Matthew J. Tomei, Philip B. Bedoukian, Shomit N. Das
  • Patent number: 10698692
    Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.
    Type: Grant
    Filed: July 21, 2016
    Date of Patent: June 30, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Greg Sadowski, John Kalamatianos, Shomit N. Das
  • Publication number: 20200192705
    Abstract: In some examples, thermal aware optimization logic determines a characteristic (e.g., a workload or type) of a wavefront (e.g., multiple threads). For example, the characteristic indicates whether the wavefront is compute intensive, memory intensive, mixed, and/or another type of wavefront. The thermal aware optimization logic determines temperature information for one or more compute units (CUs) in one or more processing cores. The temperature information includes predictive thermal information indicating expected temperatures corresponding to the one or more CUs and historical thermal information indicating current or past thermal temperatures of at least a portion of a graphics processing unit (GPU). The logic selects the one or more compute units to process the plurality of threads based on the determined characteristic and the temperature information. The logic provides instructions to the selected subset of the plurality of CUs to execute the wavefront.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Inventors: KARTHIK RAO, SHOMIT N. DAS, XUDONG AN, WEI HUANG
  • Publication number: 20200192671
    Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.
    Type: Application
    Filed: December 14, 2018
    Publication date: June 18, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann
  • Publication number: 20200183597
    Abstract: A processing system scales power to memory and memory channels based on identifying causes of stalls of threads of a wavefront. If the cause is other than an outstanding memory request, the processing system throttles power to the memory to save power. If the stall is due to memory stalls for a subset of the memory channels servicing memory access requests for threads of a wavefront, the processing system adjusts power of the memory channels servicing memory access request for the wavefront based on the subset. By boosting power to the subset of channels, the processing system enables the wavefront to complete processing more quickly, resulting in increased processing speed. Conversely, by throttling power to the remainder of channels, the processing system saves power without affecting processing speed.
    Type: Application
    Filed: December 6, 2018
    Publication date: June 11, 2020
    Inventors: Shomit N. DAS, Kishore PUNNIYAMURTHY
  • Publication number: 20200183485
    Abstract: A processing system dynamically scales at least one of voltage and frequency at a subset of a plurality of compute units of a graphics processing unit (GPU) based on characteristics of a kernel or workload to be executed at the subset. A system management unit for the processing system receives a compute unit mask, designating the subset of a plurality of compute units of a GPU to execute the kernel or workload, and workload characteristics indicating the compute-boundedness or memory bandwidth-boundedness of the kernel or workload from a central processing unit of the processing system. The system management unit determines a dynamic voltage and frequency scaling policy for the subset of the plurality of compute units of the GPU based on the compute unit mask and the workload characteristics.
    Type: Application
    Filed: December 7, 2018
    Publication date: June 11, 2020
    Inventors: Shomit N. DAS, Joseph L. GREATHOUSE
  • Publication number: 20200153757
    Abstract: A system is described that includes an integrated circuit chip having a network-on-chip. The network-on-chip includes multiple routers arranged in a topology and a separate communication link coupled between each router and each of one or more neighboring routers of that router among the multiple routers in the topology. The integrated circuit chip also includes multiple nodes, each node coupled to a router of the multiple routers. When operating, a given router of the multiple routers keeps a record of operating states of some or all of the multiple routers and corresponding communication links. The given router then routes flits to destination nodes via one or more other routers of the multiple routers based at least in part on the operating states of the some or all of the multiple routers and the corresponding communication links.
    Type: Application
    Filed: November 13, 2018
    Publication date: May 14, 2020
    Inventors: Srikant Bharadwaj, Shomit N. Das
  • Publication number: 20200151573
    Abstract: A processor determines losses of samples within an input volume that is provided to a neural network during a first epoch, groups the samples into subsets based on losses, and assigns the subsets to operands in the neural network that represent the samples at different precisions. Each subset is associated with a different precision. The processor then processes the subsets in the neural network at the different precisions during the first epoch. In some cases, the samples in the subsets are used in a forward pass and a backward pass through the neural network. A memory configured to store information representing the samples in the subsets at the different precisions. In some cases, the processor stores information representing model parameters of the neural network in the memory at the different precisions of the subsets of the corresponding samples.
    Type: Application
    Filed: May 29, 2019
    Publication date: May 14, 2020
    Inventors: Shomit N. DAS, Abhinav VISHNU
  • Publication number: 20200133866
    Abstract: The disclosure herein provides techniques for designing cache compression algorithms that control how data in caches are compressed. The techniques generate a custom “byte select algorithm” by applying repeated transforms applied to an initial compression algorithm until a set of suitability criteria is met. The suitability criteria include that the “cost” is below a threshold and that a metadata constraint is met. The “cost” is the number of blocks that can be compressed by an algorithm as compared with the “ideal” algorithm. The metadata constraint is the number of bits required for metadata.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
  • Publication number: 20200104262
    Abstract: A processing device is provided which includes memory comprising data cache memory configured to store compressed data and metadata cache memory configured to store metadata, each portion of metadata comprising an encoding used to compress a portion of data. The processing device also includes at least one processor configured to compress portions of data and select, based on one or more utility level metrics, portions of metadata to be stored in the metadata cache memory. The at least one processor is also configured to store, in the metadata cache memory, the portions of metadata selected to be stored in the metadata cache memory, store, in the data cache memory, each portion of compressed data having a selected portion of corresponding metadata stored in the metadata cache memory. Each portion of compressed data, having the selected portion of corresponding metadata stored in the metadata cache memory, is decompressed.
    Type: Application
    Filed: September 28, 2018
    Publication date: April 2, 2020
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
  • Publication number: 20200073845
    Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.
    Type: Application
    Filed: August 30, 2018
    Publication date: March 5, 2020
    Inventors: Shomit N. Das, Matthew Tomei, Shrikanth Ganapathy, John Kalamatianos
  • Patent number: 10558606
    Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.
    Type: Grant
    Filed: August 30, 2018
    Date of Patent: February 11, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Shomit N. Das, Matthew Tomei, Shrikanth Ganapathy, John Kalamatianos
  • Patent number: 10411731
    Abstract: A processing device is provided which includes a plurality of encoders each configured to compress a portion of data using a different compression algorithm. The processing device also includes one or more processors configured to cause an encoder, of the plurality of encoders, to compress the portion of data when it is determined that the portion of data, which is compressed by another encoder configured to compress the portion of data prior to the encoder in an encoder hierarchy, is not successfully compressed according to a compression metric by the other encoder in the encoder hierarchy. The one or more processors are also configured to prevent the encoder from compressing the portion of data when it is determined that the portion of data is successfully compressed according to the compression metric by the other encoder in the encoder hierarchy.
    Type: Grant
    Filed: September 24, 2018
    Date of Patent: September 10, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Shomit N. Das, Matthew Tomei