Patents Assigned to Advanced Micro Devices
-
Patent number: 10628124Abstract: Techniques and circuits are provided for stochastic rounding. In an embodiment, a circuit includes carry-save adder (CSA) logic having three or more CSA inputs, a CSA sum output, and a CSA carry output. One of the three or more CSA inputs is presented with a random number value, while other CSA inputs are presented with input values to be summed. The circuit further includes adder logic having adder inputs and a sum output. The CSA carry output of the CSA logic is coupled with one of the adder inputs of the adder logic, and the CSA sum output of the CSA logic is coupled with another input of the adder inputs of the adder logic. A particular number of most significant bits of the sum output of the adder logic represent a stochastically rounded sum of the input values.Type: GrantFiled: March 22, 2018Date of Patent: April 21, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Gabriel H. Loh
-
Patent number: 10627883Abstract: A processor includes a plurality of voltage droop detectors positioned at multiple points of a processor. The detectors monitor voltage levels and alert the processor if a droop event has been detected in real time. Multiple droops can be detected simultaneously, with each detected droop event generating an alert that is sent to a processor module, such as a clock control module, to act based on the detected droop. Each detector employs a ring oscillator that generates a periodic signal and a corresponding count based on that signal, where the frequency of the signal varies based on a voltage at the corresponding point being monitored.Type: GrantFiled: February 28, 2018Date of Patent: April 21, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Amitabh Mehra, Dana G. Lewis
-
Patent number: 10628063Abstract: A method and device generates a slab identifier and a hash function identifier in response to a memory allocation request with a request identifier and allocation size from a memory allocation requestor. The slab identifier indicates a memory region associated with a base data size and the hash function identifier indicates a hash function. The method and device provides a bit string including the slab identifier and the hash function identifier to the memory allocation requestor.Type: GrantFiled: August 24, 2018Date of Patent: April 21, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Alexander Dodd Breslow
-
Publication number: 20200117617Abstract: Described is a method and apparatus for application migration between a dockable device and a docking station in a seamless manner. The dockable device includes a processor and the docking station includes a high-performance processor. The method includes determining a docking state of a dockable device while at least an application is running. Application migration from the dockable device to a docking station is initiated when the dockable device is moving to a docked state. Application migration from the docking station to the dockable device is initiated when the dockable device is moving to an undocked state. The application continues to run during the application migration from the dockable device to the docking station or during the application migration from the docking station to the dockable device.Type: ApplicationFiled: December 6, 2019Publication date: April 16, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Jonathan Lawrence Campbell, Yuping Shen
-
Patent number: 10620958Abstract: Systems, apparatuses, and methods for efficiently reducing power consumption in a crossbar of a computing system are disclosed. A data transfer crossbar uses a first interface for receiving data fetched from a data storage device that is partitioned into multiple banks. The crossbar uses a second interface for sending data fetched from the multiple banks to multiple compute units. Logic in the crossbar selects data from a most recent fetch operation for a given compute unit when the logic determines the given compute unit is an inactive compute unit for which no data is being fetched. The logic sends via the second interface the selected data for the given compute unit. Therefore, when the given compute unit is inactive, the data lines for the fetched data do not transition for each inactive clock cycle after the most recent active clock cycle.Type: GrantFiled: December 3, 2018Date of Patent: April 14, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Xianwen Cheng
-
Patent number: 10620994Abstract: Systems, apparatuses, and methods for implementing continuation analysis tasks (CATs) are disclosed. In one embodiment, a system implements hardware acceleration of CATs to manage the dependencies and scheduling of an application composed of multiple tasks. In one embodiment, a continuation packet is referenced directly by a first task. When the first task completes, the first task enqueues a continuation packet on a first queue. The first task can specify on which queue to place the continuation packet. The agent responsible for the first queue dequeues and executes the continuation packet which invokes an analysis phase which is performed prior to determining which dependent tasks to enqueue. If it is determined during the analysis phase that a second task is now ready to be launched, the second task is enqueued on one of the queues. Then, an agent responsible for this queue dequeues and executes the second task.Type: GrantFiled: May 30, 2017Date of Patent: April 14, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Steven Tony Tye, Brian L. Sumner, Bradford Michael Beckmann, Sooraj Puthoor
-
Publication number: 20200111248Abstract: A technique for compressing an original image is disclosed. According to the technique, an original image is obtained and a delta-encoded image is generated based on the original image. Next, a segregated image is generated based on the delta-encoded image and then the segregated image is compressed to produce a compressed image. The segregated image is generated because the segregated image may be compressed more efficiently than the original image and the delta image.Type: ApplicationFiled: December 6, 2019Publication date: April 9, 2020Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Ruijin Wu, Skyler Jonathon Saleh, Christopher J. Brennan, Kei Ming Kwong, Anthony Hung-Cheong Chan
-
Publication number: 20200112731Abstract: A system and method for scalable video coding that includes base layer having lower resolution encoding, enhanced layer having higher resolution encoding and the data transferring between two layers. The system and method provides several methods to reduce bandwidth of inter-layer transfers while at the same time reducing memory requirements. Due to less memory access, the system clock frequency can be lowered so that system power consumption is lowered as well. The system avoids having prediction data from base layer to enhanced layer to be up-sampled for matching resolution in the enhanced layer as transferring up-sampled data can impose a big burden on memory bandwidth.Type: ApplicationFiled: December 6, 2019Publication date: April 9, 2020Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Lei Zhang, Ji Zhou, Zhen Chen, Min Yu
-
Patent number: 10613764Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. In various embodiments, a computing system includes a computing resource and a memory controller coupled to a memory device. The computing resource selectively generates a hint that includes a target address of a memory request generated by the processor. The hint is sent outside the primary communication fabric to the memory controller. The hint conditionally triggers a data access in the memory device. When no page in a bank targeted by the hint is open, the memory controller processes the hint by opening a target page of the hint without retrieving data. The memory controller drops the hint if there are other pending requests that target the same page or the target page is already open.Type: GrantFiled: November 20, 2017Date of Patent: April 7, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Ravindra N. Bhargava, Philip S. Park, Vydhyanathan Kalyanasundharam, James Raymond Magro
-
Patent number: 10613983Abstract: A method includes monitoring a request rate of speculative memory read requests from a penultimate-level cache to a main memory. The speculative memory read requests correspond to data read requests that missed in the penultimate-level cache. A hit rate of searches of a last-level cache for data requested by the data read requests is monitored. Core demand speculative memory read requests to the main memory are selectively enabled in parallel with searching of the last-level cache for data of a corresponding core demand data read request based on the request rate and the hit rate. Prefetch speculative memory read requests to the main memory are selectively enabled in parallel with searching of the last-level cache for data of a corresponding prefetch data read request based on the request rate and the hit rate.Type: GrantFiled: March 20, 2018Date of Patent: April 7, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Tanuj Kumar Agarwal, Anasua Bhowmik, Douglas Benson Hunt
-
Patent number: 10613957Abstract: Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.Type: GrantFiled: June 24, 2016Date of Patent: April 7, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Brian J. Kocoloski, Leonardo Piga, Wei Huang, Indrani Paul
-
Publication number: 20200104262Abstract: A processing device is provided which includes memory comprising data cache memory configured to store compressed data and metadata cache memory configured to store metadata, each portion of metadata comprising an encoding used to compress a portion of data. The processing device also includes at least one processor configured to compress portions of data and select, based on one or more utility level metrics, portions of metadata to be stored in the metadata cache memory. The at least one processor is also configured to store, in the metadata cache memory, the portions of metadata selected to be stored in the metadata cache memory, store, in the data cache memory, each portion of compressed data having a selected portion of corresponding metadata stored in the metadata cache memory. Each portion of compressed data, having the selected portion of corresponding metadata stored in the metadata cache memory, is decompressed.Type: ApplicationFiled: September 28, 2018Publication date: April 2, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, David A. Wood
-
Patent number: 10606599Abstract: A system and method for using an operation (op) cache is disclosed. The system and method include an op cache for caching previously decoded instructions. The op cache includes a plurality of physically indexed and tagged instructions allowing sharing of instructions between threads. The op cache is chained through multiple ways allowing service of a plurality of instructions in a cache line. The op cache is stored between a shared operation storage and immediate/displacement storage to maximize capacity.Type: GrantFiled: December 9, 2016Date of Patent: March 31, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventor: David N. Suggs
-
Patent number: 10608633Abstract: An electronic device includes a die stack having a plurality of die. The die stack includes a die parity path spanning the plurality of die and configured to alternatingly identify each die as a first type or a second type. The die stack further includes an inter-die signal path spanning the plurality of die and configured to propagate an inter-die signal through the plurality of die, wherein the inter-die signal path is configured to invert a logic state of the inter-die signal between each die. Each die of the plurality of die includes signal formatting logic configured to selectively invert a logic state of the inter-die signal before providing it to other circuitry of the die responsive to whether the die is designated as the first type or the second type.Type: GrantFiled: August 28, 2019Date of Patent: March 31, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Russell Schreiber
-
Patent number: 10606740Abstract: Systems, apparatuses, and methods for generating flexibly addressed memory requests are disclosed. In one embodiment, a system includes a processor, control unit, and memory subsystem. The processor launches a plurality of threads on a plurality of compute units, wherein each thread generates memory requests without specifying target memory addresses. The threads executing on the plurality of compute units convey a plurality of memory requests to the control unit. The control unit generates target memory addresses for the plurality of received memory requests. In one embodiment, the memory requests are write requests, and the control unit interleaves write requests from the plurality of threads into a single output buffer stored in the memory subsystem. The control unit can be located in a cache, in a memory controller, or in another location within the system.Type: GrantFiled: May 26, 2017Date of Patent: March 31, 2020Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Yunpeng Zhu, Jimshed Mirza
-
Patent number: 10608943Abstract: Systems, apparatuses, and methods for dynamic buffer management in multi-client token flow control routers are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric with a plurality of routers coupled to the processing unit(s) and the memory. A router servicing multiple active clients allocates a first number of tokens to each active client. The first number of tokens is less than a second number of tokens needed to saturate the bandwidth of each client to the router. The router also allocates a third number of tokens to a free pool, with tokens from the free pool being dynamically allocated to different clients. The third number of tokens is equal to the difference between the second number of tokens and the first number of tokens. An advantage of this approach is reducing the amount of buffer space needed at the router.Type: GrantFiled: October 27, 2017Date of Patent: March 31, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alan Dodson Smith, Chintan S. Patel, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Narendra Kamat
-
Patent number: 10608076Abstract: A system and method for fabricating metal insulator metal capacitors while managing semiconductor processing yield and increasing capacitance per area are described. A semiconductor device fabrication process places a polysilicon layer on top of an oxide layer which is on top of a metal layer. The process etches trenches into areas of the polysilicon layer where the repeated trenches determine a frequency of an oscillating wave structure to be formed later. The top and bottom corners of the trenches are rounded. The process deposits a bottom metal, a dielectric, and a top metal on the polysilicon layer both on areas with the trenches and on areas without the trenches. A series of a barrier metal and a second polysilicon layer is deposited on the oscillating structure. The process completes the MIM capacitor with metal nodes contacting each of the top metal and the bottom metal of the oscillating structure.Type: GrantFiled: March 22, 2017Date of Patent: March 31, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Richard T. Schultz
-
Publication number: 20200098169Abstract: Described herein are techniques for improving the effectiveness of depth culling. In a first technique, a binner is used to sort primitives into depth bins. Each depth bin covers a range of depths. The binner transmits the depth bins to the screen space pipeline for processing in near-to-far order. Processing the near bins first results in the depth buffer being updated, allowing fragments for the primitives in the farther bins to be culled more aggressively than if the depth binning did not occur. In a second technique, a buffer is used to initiate two-pass processing through the screen space pipeline. In the first pass, primitives are sent down to update the depth block and are then culled. The fragments are processed normally in the second pass, with the benefit of the updated depth values.Type: ApplicationFiled: September 21, 2018Publication date: March 26, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Ruijin Wu, Young In Yeo, Sagar S. Bhandare, Vineet Goel, Martin G. Sarov, Christopher J. Brennan
-
Patent number: 10599578Abstract: A processing system fills a memory access request for data from a processor core by bypassing a cache when a write congestion condition is detected, and when transferring the data to the cache would cause eviction of a dirty cache line. The cache is bypassed by transferring the requested data to the processor core or to a different cache. Accordingly, the processing system can temporarily bypass the cache storing the dirty cache line when filling a memory access request, thereby avoiding the eviction and write back to main memory of a dirty cache line when a write congestion condition exists.Type: GrantFiled: December 13, 2016Date of Patent: March 24, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Amin Farmahini Farahani, David A. Roberts
-
Patent number: 10601723Abstract: A computing system uses a memory for storing data, one or more clients for generating network traffic and a communication fabric with network switches. The network switches include centralized storage structures, rather than separate input and output storage structures. The network switches store particular metadata corresponding to received packets in a single, centralized collapsing queue where the age of the packets corresponds to a queue entry position. The payload data of the packets are stored in a separate memory, so the relatively large amount of data is not shifted during the lifetime of the packet in the network switch. The network switches select sparse queue entries in the collapsible queue, deallocate the selected queue entries, and shift remaining allocated queue entries toward a first end of the queue with a delay proportional to the radix of the network switches.Type: GrantFiled: April 12, 2018Date of Patent: March 24, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alan Dodson Smith, Vydhyanathan Kalyanasundharam, Bryan P. Broussard, Greggory D. Donley, Chintan S. Patel