Patents Assigned to Advanced Micros Devices, Inc.
-
Publication number: 20200065113Abstract: Techniques for improving performance of accelerated processing devices (“APDs”) when exceptions occur are provided. In APDs, the very large number of parallel processing execution units, and the complexity of the hardware used to execute a large number of work-items in parallel, means that APDs typically stall when an exception occurs (unlike in central processing units (“CPUs”), which are able to execute speculatively and out-of-order). However, the techniques provided herein allow at least some execution to occur past exceptions. Execution past an exception generating instruction occurs by executing instructions that would not lead to a corruption while skipping those that would lead to a corruption. After the exception has been satisfied, execution occurs in a replay mode in which the potentially exception-generating instruction is executed and in which instructions that did not execute in the exception-wait mode are executed. A mask and counter are used to control execution in replay mode.Type: ApplicationFiled: August 22, 2018Publication date: February 27, 2020Applicant: Advanced Micro Devices, Inc.Inventor: Anthony T. Gutierrez
-
Publication number: 20200066677Abstract: A data processor is implemented as an integrated circuit. The data processor includes a processor die. The processor die is connected to an integrated voltage regulator die using die-to-die bonding. The integrated voltage regulator die provides a regulated voltage to the processor die, and the processor die operates in response to the regulated voltage.Type: ApplicationFiled: August 23, 2018Publication date: February 27, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Milind Bhagavat, David Hugh McIntyre, Rahul Agarwal
-
Patent number: 10572389Abstract: Systems, apparatuses, and methods for performing efficient memory accesses for a computing system are disclosed. External system memory is used as a last-level cache and includes one of a variety of types of dynamic random access memory (DRAM). A memory controller generates a tag request and a separate data request based on a same, single received memory request. The sending of the tag request is prioritized over sending the data request. A partial tag comparison is performed during processing of the tag request. If a tag miss is detected for the partial tag comparison, then the data request is cancelled, and the memory request is sent to main memory. If one or more tag hits are detected for the partial tag comparison, then processing of the data request is dependent upon the result of the full tag comparison.Type: GrantFiled: December 12, 2017Date of Patent: February 25, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Ravindra N. Bhargava, Ganesh Balakrishnan
-
Patent number: 10573630Abstract: A three-dimensional integrated circuit includes a first die having a first geometry. The first die includes a first region that operates with a first power density and a second region that operates with a second power density. The first power density is less than the second power density. The first die includes first electrical contacts disposed in the first region on a first side of the first die along a periphery of the first die. The three-dimensional integrated circuit includes a second die having a second geometry. The second die includes second electrical contacts disposed on a first side of the second die. A stacked portion of the second die is stacked within the periphery of the first die and an overhang portion of the second die extends beyond the periphery of the first die. The second electrical contacts are aligned with and coupled to the first electrical contacts.Type: GrantFiled: April 20, 2018Date of Patent: February 25, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Brett P. Wilkerson, Milind Bhagavat, Rahul Agarwal, Dmitri Yudanov
-
Patent number: 10572183Abstract: A data processing system includes a memory and a data processor. The data processor is connected to the memory and adapted to access the memory in response to scheduled memory access requests. The data processor has power management logic that, in response to detecting a memory power state change, determines whether to retrain or suppress retraining of at least one parameter related to accessing the memory based on an operating state of the memory. The power management logic further determines a retraining interval for retraining the at least one parameter related to accessing the memory, and initiates a retraining operation in response to the memory power state change based on the operating state of the memory being outside of a predetermined threshold.Type: GrantFiled: October 18, 2017Date of Patent: February 25, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Sonu Arora, Guhan Krishnan, Kevin Brandl
-
Publication number: 20200057717Abstract: In one form, a data processing system includes a host integrated circuit having a memory controller, a memory bus coupled to the memory controller, and a memory module. The memory module includes a bulk memory and a memory module scratchpad coupled to the bulk memory, wherein the memory module scratchpad has a lower access overhead than the bulk memory. The memory controller selectively provides predetermined commands over the memory bus to cause the memory module to copy data between the bulk memory and the memory module scratchpad without conducting data on the memory bus in response to a data movement decision.Type: ApplicationFiled: August 17, 2018Publication date: February 20, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Nuwan Jayasena, Amin Farmahini Farahani, Michael Ignatowski
-
Patent number: 10568025Abstract: An approach is provided for data processing methods wherein a PHY layer transmit operation is constructed by aggregating MAC layer data units (MPDUs) by means of a linked list. A link list of TX descriptors can be formed separately from the payloads. In order to minimize processor speed rates, the next transmission buffer is pre-fetched whenever appropriate. Interlocks are used to prevent conflict on descriptor fetches and payload fetches.Type: GrantFiled: July 13, 2015Date of Patent: February 18, 2020Assignees: ADVANCED MICRO DEVICES, INC., AMD FAR EAST LTD.Inventors: Sebastian Ahmed, Douglas A. Mammoser
-
Patent number: 10558591Abstract: Systems, apparatuses, and methods for implementing priority adjustment forwarding are disclosed. A system includes at least one or more processing units, a memory, and a communication fabric coupled to the processing unit(s) and the memory. The communication fabric includes a plurality of arbitration points. When a client determines that its bandwidth requirements are not being met, the client generates and sends an in-band priority adjustment request to the nearest arbitration point. This arbitration point receives the in-band priority adjustment request and then identifies any pending requests which are buffered at the arbitration point which meet the criteria specified by the in-band priority adjustment request. The arbitration point adjusts the priority of any identified requests, and then the arbitration point forwards the in-band priority adjustment request on the fabric to the next upstream arbitration point which processes the in-band priority adjustment request in the same manner.Type: GrantFiled: October 9, 2017Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alan Dodson Smith, Eric Christopher Morton, Vydhyanathan Kalyanasundharam, Joe G. Cruz
-
Patent number: 10558499Abstract: Footprints, or resource allocations, of waves within resources that are shared by processor cores in a multithreaded processor are measured concurrently with the waves executing on the processor cores. The footprints are averaged over a time interval. A number of waves are spawned and dispatched for execution in the multithreaded processor based on the average footprint. In some cases, the waves are spawned at a rate that is determined based on the average value of the footprints of waves within the resources. The rate of spawning waves is modified in response to a change in the average value of the footprints of the waves within the resources.Type: GrantFiled: October 26, 2017Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Maxim V. Kazakov, Michael Mantor
-
Patent number: 10558466Abstract: Systems, apparatuses, and methods for adjusting group sizes to match a processor lane width are described. In early iterations of an algorithm, a processor partitions a dataset into groups of data points which are integer multiples of the processing lane width of the processor. For example, when performing a K-means clustering algorithm, the processor determines that a first plurality of data points belong to a first group during a given iteration. If the first plurality of data points is not an integer multiple of the number of processing lanes, then the processor reassigns a first number of data points from the first plurality of data points to one or more other groups. The processor then performs the next iteration with these first number of data points assigned to other groups even though the first number of data points actually meets the algorithmic criteria for belonging to the first group.Type: GrantFiled: June 23, 2016Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Mauricio Breternitz, Mayank Daga
-
Patent number: 10560022Abstract: An apparatus includes an integrated circuit chip with a set of circuits having two or more subsets of circuits; an external voltage regulator separate from the integrated circuit chip; two or more integrated voltage regulators on the integrated circuit chip that each provide an input voltage to a respective subset of the circuits; and a controller. The controller determines, using an integrated voltage regulator power loss model, an electrical power loss for the integrated voltage regulators for a first combination of operating points for the subsets of the circuits. The controller then determines, based on the electrical power loss, a second combination of operating points for the subsets of the circuits that includes an adjustment to an operating point for at least one of the subsets of the circuits that compensates for an electrical power loss of the corresponding integrated voltage regulator.Type: GrantFiled: June 13, 2019Date of Patent: February 11, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Wei Huang, Miguel Rodriguez, Karthik Rao
-
Patent number: 10558606Abstract: Systems, apparatuses, and methods for reliably transmitting data over voltage scaled links are disclosed. A computing system includes at least first and second devices connected via a link. In one implementation, if a data block can be compressed to less than or equal to half the original size of the data block, then the data block is compressed and sent on the link in a single clock cycle rather than two clock cycles. If the data block cannot be compressed to half the original size, but if the data block can be compressed enough to include error correction code (ECC) bits without exceeding the original size, then ECC bits are added to the compressed block which is sent on the link at a reduced voltage. The ECC bits help to correct for any errors that are generated as a result of operating the link at the reduced voltage.Type: GrantFiled: August 30, 2018Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Matthew Tomei, Shrikanth Ganapathy, John Kalamatianos
-
Patent number: 10558489Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.Type: GrantFiled: February 21, 2017Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alexander Fuad Ashkar, Michael J. Mantor, Randy Wayne Ramsey, Rex Eldon McCrary, Harry J. Wise
-
Patent number: 10558418Abstract: A technique for implementing synchronization monitors on an accelerated processing device (“APD”) is provided. Work on an APD includes workgroups that include one or more wavefronts. All wavefronts of a workgroup execute on a single compute unit. A monitor is a synchronization construct that allows workgroups to stall until a particular condition is met. Responsive to all wavefronts of a workgroup executing a wait instruction, the monitor coordinator records the workgroup in an “entry queue.” The workgroup begins saving its state to a general APD memory and, when such saving is complete, the monitor coordinator moves the workgroup to a “condition queue.” When the condition specified by the wait instruction is met, the monitor coordinator moves the workgroup to a “ready queue,” and, when sufficient resources are available on a compute unit, the APD schedules the ready workgroup for execution on a compute unit.Type: GrantFiled: July 27, 2017Date of Patent: February 11, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Bradford M. Beckmann
-
Publication number: 20200041577Abstract: In one form, a power supply monitor including a current controlled oscillator circuit, a time-to-digital converter, and an output divider. The current controlled oscillator circuit has an input for receiving a power supply voltage to be measured, and an output for providing a frequency signal having a frequency linearly proportional to the power supply voltage. The time-to-digital converter has an input coupled to the output of the current controlled oscillator circuit, and an output for providing a count signal representative of a number of cycles of a reference clock signal per cycle of the frequency signal. The output divider has an input coupled to the output of the time-to-digital converter, and an output for providing a divided count signal representative of a value of the power supply voltage, and provides the divided count signal by dividing a fixed number by the count signal.Type: ApplicationFiled: August 3, 2018Publication date: February 6, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Ravinder Reddy Rachala, Stephen Victor Kosonocky, Miguel Rodriguez
-
Patent number: 10553258Abstract: Managing temperature of a semiconductor device having a temperature inverted processor core and stacked memory by operation of an integrated thermoelectric cooler. The thermoelectric cooler is operated to pump heat from a stacked memory device that requires a cool operating temperature to a temperature inverted processor core that maintains a higher operating temperature until threshold operating temperatures are achieved.Type: GrantFiled: December 21, 2018Date of Patent: February 4, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Wei Huang
-
Patent number: 10552339Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.Type: GrantFiled: June 12, 2018Date of Patent: February 4, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Arkaprava Basu, Joseph L. Greathouse
-
Publication number: 20200034195Abstract: Techniques for improved networking performance in systems where a graphics processing unit or other highly parallel non-central-processing-unit (referred to as an accelerated processing device or “APD” herein) has the ability to directly issue commands to a networking device such as a network interface controller (“NIC”) are disclosed. According to a first technique, the latency associated with loading certain metadata into NIC hardware memory is reduced or eliminated by pre-fetching network command queue metadata into hardware network command queue metadata slots of the NIC, thereby reducing the latency associated with fetching that metadata at a later time. A second technique involves reducing latency by prioritizing work on an APD when it is known that certain network traffic is soon to arrive over the network via a NIC.Type: ApplicationFiled: July 30, 2018Publication date: January 30, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Michael W. LeBeane, Khaled Hamidouche, Bradford M. Beckmann
-
Publication number: 20200035017Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.Type: ApplicationFiled: October 2, 2019Publication date: January 30, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Mangesh P. NIJASURE, Randy W. RAMSEY, Todd MARTIN
-
Publication number: 20200034144Abstract: Overhead associated with verifying function return addresses to protect against security exploits is reduced by taking advantage of branch prediction mechanisms for predicting return addresses. More specifically, returning from a function includes popping a return address from a data stack. Well-known security exploits overwrite the return address on the data stack to hijack control flow. In some processors, a separate data structure referred to as a control stack is used to verify the data stack. When a return instruction is executed, the processor issues an exception if the return addresses on the control stack and the data stack are not identical. This overhead can be avoided by taking advantage of the return address stack, which is a data structure used by the branch predictor to predict return addresses. In most situations, if this prediction is correct, the above check does not need to occur, thus reducing the associated overhead.Type: ApplicationFiled: July 26, 2018Publication date: January 30, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Marius Evers, David A. Kaplan, Debjit Das Sarma