Patents Assigned to Advanced Micro Devices
  • Patent number: 9928191
    Abstract: A communication device includes a data source that generates data for transmission over a bus, and that further includes a data encoder coupled to receive and encode outgoing data. The encoder further includes a coupling toggle rate (CTR) calculator configured to calculate a CTR for the outgoing data, a threshold calculator configured to determine an expected value of the CTR as a threshold value, a comparator configured to compare the calculated CTR to the threshold value wherein the comparison is used to determine whether to perform an encoding step by an encoding block configured to selectively encode said data. A method according to one embodiment includes determining and comparing a CTR and an expected CTR to determine whether to encode the outgoing data. Any one of a plurality different coding techniques may be used including bus inversion.
    Type: Grant
    Filed: July 30, 2015
    Date of Patent: March 27, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Greg Sadowski, John Kalamatianos
  • Patent number: 9928176
    Abstract: A processor applies a transfer policy to a portion of a cache based on access metrics for different test regions of the cache, wherein each test region applies a different transfer policy for data in cache entries that were stored in response to a prefetch requests but were not the subject of demand requests. One test region applies a transfer policy under which unused prefetches are transferred to a higher level cache in a cache hierarchy upon eviction from the test region of the cache. The other test region applies a transfer policy under which unused prefetches are replaced without being transferred to a higher level cache (or are transferred to the higher level cache but stored as invalid data) upon eviction from the test region of the cache.
    Type: Grant
    Filed: July 20, 2016
    Date of Patent: March 27, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Paul James Moyer
  • Publication number: 20180084270
    Abstract: A processing apparatus is provided that includes an encoder configured to encode current frames of video data using previously encoded reference frames and perform motion searches within a search window about each of a plurality of co-located portions of a reference frame. The processing apparatus also includes a processor configured to determine, prior to performing the motion searches, which locations of the reference frame to reload the search window according to a threshold number of search window reloads using predicted motions of portions of the reference frame corresponding to each of the locations. The processor is also configured to cause the encoder to reload the search window at the determined locations of the reference frame and, for each of the remaining locations of the reference frame, slide the search window in a first direction indicated by the location of the next co-located portion of the reference frame.
    Type: Application
    Filed: September 20, 2016
    Publication date: March 22, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Ihab Amer, Gabor Sines, Edward Harold, Jinbo Qiu, Lei Zhang, Yang Liu, Zhen Chen, Ying Luo, Shu-Hsien Wu, Zhong Cai
  • Publication number: 20180081544
    Abstract: Techniques for selectively executing a lock instruction speculatively or non-speculatively based on lock address prediction and/or temporal lock prediction. including methods an devices for locking an entry in a memory device. In some techniques, a lock instruction executed by a thread for a particular memory entry of a memory device is detected. Whether contention occurred for the particular memory entry during an earlier speculative lock is detected on a condition that the lock instruction comprises a speculative lock instruction. The lock is executed non-speculatively if contention occurred for the particular memory entry during an earlier speculative lock. The lock is executed speculatively if contention did not occur for the particular memory entry during an earlier speculative lock.
    Type: Application
    Filed: September 22, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King, Matthew A. Rafacz, Matthew M. Crum
  • Publication number: 20180081810
    Abstract: The techniques described herein improve cache traffic performance in the context of contended lock instructions. More specifically, each core maintains a lock address contention table that stores addresses corresponding to contended lock instructions. The lock address contention table also includes a state value that indicates progress through a series of states meant to track whether a load by the core in a spin-loop associated with semaphore acquisition has obtained the semaphore in an exclusive state. Upon detecting that a load in a spin-loop has obtained the semaphore in an exclusive state, the core responds to incoming requests for access to the semaphore with negative acknowledgments. This allows the core to maintain the semaphore cache line in an exclusive state, which allows it to acquire the semaphore faster and to avoid transmitting that cache line to other cores unnecessarily.
    Type: Application
    Filed: September 19, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: John M. King, Gregory W. Smaus
  • Publication number: 20180082399
    Abstract: Improvements in the graphics processing pipeline are disclosed. More specifically, a new primitive shader stage performs tasks of the vertex shader stage or a domain shader stage if tessellation is enabled, a geometry shader if enabled, and a fixed function primitive assembler. The primitive shader stage is compiled by a driver from user-provided vertex or domain shader code, geometry shader code, and from code that performs functions of the primitive assembler. Moving tasks of the fixed function primitive assembler to a primitive shader that executes in programmable hardware provides many benefits, such as removal of a fixed function crossbar, removal of dedicated parameter and position buffers that are unusable in general compute mode, and other benefits.
    Type: Application
    Filed: January 25, 2017
    Publication date: March 22, 2018
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Todd Martin, Mangesh P. Nijasure, Randy W. Ramsey, Michael Mantor, Laurent Lefebvre
  • Publication number: 20180082470
    Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.
    Type: Application
    Filed: December 23, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Mangesh P. Nijasure, Randy W. Ramsey, Todd Martin
  • Publication number: 20180081625
    Abstract: A system and method for managing data in a ring buffer is disclosed. The system includes a legacy ring buffer functioning as an on-chip ring buffer, a supplemental buffer for storing data in the ring buffer, a preload ring buffer that is on-chip and capable of receiving preload data from the supplemental buffer, a write controller that determines where to write data that is write requested by a write client of the ring buffer, and a read controller that controls a return of data to a read client pursuant to a read request to the ring buffer.
    Type: Application
    Filed: September 20, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: XuHong Xiong, Pingping Shao, ZhongXiang Luo, ChenBin Wang
  • Publication number: 20180081818
    Abstract: A method and apparatus for transmitting data includes determining whether to apply a mask to a cache line that includes a first type of data and a second type of data for transmission based upon a first criteria. The second type of data is filtered from the cache line, and the first type of data along with an identifier of the applied mask is transmitted. The first type of data and the identifier is received, and the second type of data is combined with the first type of data to recreate the cache line based upon the received identifier.
    Type: Application
    Filed: September 19, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Shuai Che, Jieming Yin
  • Publication number: 20180081715
    Abstract: Techniques for scheduling processing tasks in a device having multiple computing elements are disclosed. A network interface controller of the device receives processing tasks, for execution on the computing elements, from a network that is external to the device. The network interface controller schedules the tasks for execution on the computing devices based on policy data available to the network interface controller. A scheduler within the network interface controller, which can be implemented as a standalone processing unit (such as a microcontroller, a programmable processing core, or an application specific integrated circuit), performs such scheduling, thereby freeing the central processing unit of the device from the burden of performing scheduling operations. The scheduler schedules the tasks according to any technically feasible scheduling technique.
    Type: Application
    Filed: September 16, 2016
    Publication date: March 22, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael W. LeBeane, Abhisek Pan, Steven K. Reinhardt
  • Patent number: 9921635
    Abstract: An approach is described herein that includes a method for power management of a device. In one example, the method includes sampling duration characteristics for a plurality of past idle events for a predetermined interval of time and determining whether to transition a device to a powered-down state based on the sampled duration characteristics. In another example, the method includes determining whether an average idle time for a plurality of past idle events exceeds an energy break-even point threshold. If the average idle time for the plurality of past idle events exceeds the energy break-even point threshold, a device is immediately transitioned to a powered-down state upon receipt of a next idle event. If the average idle time for the plurality of past idle events does not exceed the energy break-even point threshold, transition of the device to the powered-down state is delayed.
    Type: Grant
    Filed: October 31, 2013
    Date of Patent: March 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Yasuko Eckert, Manish Arora
  • Patent number: 9924176
    Abstract: A method and apparatus is provided for block based compression of a texture using hardware supported compression formats. The method comprises dividing a texture into a plurality of blocks, for each block, determining a transform for use with the block to minimize an error metric, encoding at least one characteristic of the transform into a plurality of bits otherwise available to represent reference component values, and compressing the block.
    Type: Grant
    Filed: October 9, 2015
    Date of Patent: March 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Andrew S. C. Pomianowski, Konstantine Iourcha
  • Publication number: 20180074977
    Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.
    Type: Application
    Filed: September 15, 2016
    Publication date: March 15, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
  • Publication number: 20180075574
    Abstract: A method and apparatus for real time compressing randomly accessed data includes extracting a block of randomly accessed data from a memory hierarchy. One or more individual portions of the randomly accessed data are independently compressed in real time to create a lossless compressed image surface. The compressed image surface includes data of independently compressed image blocks for reading and decompressing in a random order. The method further includes storing structured information relating to the dynamically compressed randomly accessed data.
    Type: Application
    Filed: September 12, 2016
    Publication date: March 15, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Chris Brennan, Timour T. Paltashev
  • Publication number: 20180074958
    Abstract: A data processing system includes a plurality of processors, local memories associated with a corresponding processor, and at least one inter-processor link In response to a first processor performing a load or store operation on an address of a corresponding local memory that is not currently in the local cache, a local cache allocates a first cache line and encodes a local state with the first cache line. In response to a load operation from an address of a remote memory that is not currently in the local cache, the local cache allocates a second cache line and encodes a remote state with the second cache line. The first processor performs subsequent loads and stores on the first cache line in the local cache in response to the local state, and subsequent loads from the second cache line in the local cache in response to the remote state.
    Type: Application
    Filed: September 14, 2016
    Publication date: March 15, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan Jayasena, Michael Boyer
  • Publication number: 20180074965
    Abstract: Described is a system and method for efficient pointer chasing in systems having a single memory node or a network of memory nodes. In particular, a pointer chasing command is sent along with a memory request by an issuing node to a memory node. The pointer chasing command indicates the number of interdependent memory accesses and information needed for the identified interdependent memory accesses. An address computing unit associated with the memory node determines the relevant memory address for an interdependent memory access absent further interaction with the issuing node or without having to return to the issuing node.
    Type: Application
    Filed: September 15, 2016
    Publication date: March 15, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Paula Aguilera Diez, Amin Farmahini-Farahani, Nuwan Jayasena
  • Patent number: 9916243
    Abstract: A method and apparatus for performing a bus lock and a translation lookaside buffer invalidate transaction includes receiving, by a lock master, a lock request from a first processor in a system. The lock master sends a quiesce request to all processors in the system, and upon receipt of the quiesce request from the lock master, all processors cease issuing any new transactions and issue a quiesce granted transaction. Upon receipt of the quiesce granted transactions from all processors, the lock master issues a lock granted message that includes an identifier of the first processor. The first processor performs an atomic transaction sequence and sends a first lock release message to the lock master upon completion of the atomic transaction sequence. The lock master sends a second lock release message to all processors upon receiving the first lock release message from the first processor.
    Type: Grant
    Filed: October 23, 2014
    Date of Patent: March 13, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: William L. Walker, Paul J. Moyer, Richard M. Born, Eric Morton, David Christie, Marius Evers, Scott T. Bingham
  • Patent number: 9916246
    Abstract: A processing system includes a shadow tag memory, which stores a plurality of entries containing coherency information for the cachelines residing at the various levels of private caches. If a cache miss occurs at a private cache, or if coherency information for a cacheline requires updating, a probe is sent to the shadow tag memory maintained at the shared cache to determine whether the requested (or affected) cacheline is stored at another private cache. The probe includes a tag which can be divided into two or more portions. To more efficiently compare the probe tag to the shadow tag entries, the comparison is performed in multiple stages based on the portions of the probe tag.
    Type: Grant
    Filed: August 16, 2016
    Date of Patent: March 13, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Carson Donahue Henrion, Michael K. Ciraula, Gregg Donley, Alok Garg, Eric Busta
  • Patent number: 9916189
    Abstract: In the described embodiments, entities in a computing device selectively write specified values to a lock variable in a local cache and one or more lower levels of a memory hierarchy to enable multiple entities to enable the concurrent execution of corresponding critical sections of program code that are protected by a same lock.
    Type: Grant
    Filed: September 6, 2014
    Date of Patent: March 13, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Martin T. Pohlack, Stephan Diestelhorst
  • Publication number: 20180069767
    Abstract: Techniques described herein improve processor performance in situations where a large number of system service requests are being received from other devices. More specifically, upon detecting that certain operating conditions that indicate a processor slowdown are present, the processor performs one or more system service adjustment techniques. These techniques include throttling (reducing the rate of handling) of such requests, coalescing (grouping multiple requests into a single group) the requests, disabling microarchitctural structures (such as caches or branch prediction units) or updates to those structures, and prefetching data for or pre-performing these requests. Each of these adjustment techniques helps to reduce the number of and/or workload associated with servicing requests for system services.
    Type: Application
    Filed: September 6, 2016
    Publication date: March 8, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Joseph L. Greathouse, Guru Prasadh V. Venkataramani, Jan Vesely