Patents Assigned to Advanced Micros Devices, Inc.
  • Publication number: 20190228574
    Abstract: Techniques for removing reset indices from, and identifying primitives in, an index stream that defines a set of primitives to be rendered, are disclosed. The index stream may be specified by an application program executing on the central processing unit. The technique involves classifying the primitive topology for the index stream as either requiring an offset-based technique or requiring a non-offset-based technique. This classification is done by determining whether, according to the primitive topology, each subsequent index can form a primitive with prior indices (e.g., line strip, triangle strip). If each subsequent index can form a primitive with prior indices, then the technique used is the non-offset-based technique. If each subsequent index does not form a primitive with prior indices, but instead at least two indices are required to form a new primitive (e.g., line list, triangle list), then the technique used is the offset-based technique.
    Type: Application
    Filed: January 22, 2019
    Publication date: July 25, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Saad Arrabi, Mangesh P. Nijasure, Todd Martin
  • Publication number: 20190229736
    Abstract: An oscillator circuit is provided that adapts to voltage supply variations. The circuit first and second delays lines connected inputs of an edge detector, one delay line supplied by a reference voltage and the other with a drooping supply voltage. The edge detector generates an output clock based on a relationship between the inputs. The output clock applied to the signal inputs of the first and second delay lines. The output clock has a voltage dependent frequency performance curve with a slope dependent at least on the second delay line delay and a delay of the edge detector. At least one of the first delay line, the second delay line, and the edge detector delay are adjusted to change the slope of the performance curve.
    Type: Application
    Filed: March 29, 2019
    Publication date: July 25, 2019
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Stephen Victor Kosonocky, Mikhail Rodionov, Joyce Cheuk Wai Wong
  • Patent number: 10360652
    Abstract: A processor comprising hardware logic configured to execute of a first wavefront in a hardware resource and stop execution of the first wavefront before the first wavefront completes. The processor schedules a second wavefront for execution in the hardware resource.
    Type: Grant
    Filed: June 13, 2014
    Date of Patent: July 23, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marc S. Orr, Bradford M. Beckmann, Benedict R. Gaster, Steven K. Reinhardt, David A. Wood
  • Patent number: 10360177
    Abstract: Described is a method and processing apparatus to improve power efficiency by gating redundant threads processing. In particular, the method for gating redundant threads in a graphics processor includes determining if data for a thread and data for at least another thread are within a predetermined similarity threshold, gating execution of the at least another thread if the data for the thread and the data for the at least another thread are within the predetermined similarity threshold, and using an output data from the thread as an output data for the at least another thread.
    Type: Grant
    Filed: June 22, 2016
    Date of Patent: July 23, 2019
    Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Syed Zohaib M. Gilani, Jiasheng Chen, QingCheng Wang, YunXiao Zou, Michael Mantor, Bin He, Timour T. Paltashev
  • Patent number: 10361175
    Abstract: The present invention relates to a multichip system and a method for scheduling threads in 3D stacked chip. The multichip system comprises a plurality of dies stacked vertically and electrically coupled together; each of the plurality of dies comprising one or more cores, each of the plurality of dies further comprising: at least one voltage violation sensing unit, the at least one voltage violation sensing unit being connected with the one or more cores of each die, the at least one voltage sensing unit being configured to independently sense voltage violation in each core of each die; and at least one frequency tuning unit, the at least one frequency tuning unit being configured to tune the frequency of each core of each die, the at least one frequency tuning unit being connected with the at least one voltage violation sensing unit. The multichip system and method described in present invention have many advantages, such as reducing voltage violation, mitigating voltage droop and saving power.
    Type: Grant
    Filed: February 9, 2017
    Date of Patent: July 23, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Yi Xu, Xing Hu, Yuan Xie
  • Publication number: 20190220426
    Abstract: A configurable computing system which uses near-memory and in-memory hardened logic blocks is described herein. The hardened logic blocks are incorporated into memory modules. The memory modules include an interface or communication logic to communicate between the configurable computing substrate and the memory module. In an implementation, the memory modules can include an on-die memory or other forms of non-configurable logic to enable more efficient processing for a variety of operations. In another implementation, the memory modules can include a portion of configurable computing substrate logic fabric to enable more efficient processing for a variety of operations. In another implementation, the memory modules can include an on-die memory and a portion of configurable computing substrate logic fabric to enable more efficient processing for a variety of operations.
    Type: Application
    Filed: January 16, 2018
    Publication date: July 18, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan Jayasena, Michael Ignatowski
  • Patent number: 10353859
    Abstract: A method for allocating registers in a compute unit of a vector processor includes determining a maximum number of registers that are to be used concurrently by a plurality of threads of a kernel at the compute unit. The method further includes setting a mode of register allocation at the compute unit based on a comparison of the determined maximum number of registers and a total number of physical registers implemented at the compute unit.
    Type: Grant
    Filed: February 14, 2017
    Date of Patent: July 16, 2019
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: YunPeng Zhu, Jimshed Mirza
  • Patent number: 10355966
    Abstract: Systems, apparatuses, and methods for managing variations among nodes in parallel system frameworks. Sensor and performance data associated with the nodes of a multi-node cluster may be monitored to detect variations among the nodes. A variability metric may be calculated for each node of the cluster based on the sensor and performance data associated with the node. The variability metrics may then be used by a mapper to efficiently map tasks of a parallel application to the nodes of the cluster. In one embodiment, the mapper may assign the critical tasks of the parallel application to the nodes with the lowest variability metrics. In another embodiment, the hardware of the nodes may be reconfigured so as to reduce the node-to-node variability.
    Type: Grant
    Filed: March 25, 2016
    Date of Patent: July 16, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Samuel Lawrence Wasmundt, Leonardo Piga, Indrani Paul, Wei Huang, Manish Arora
  • Patent number: 10353708
    Abstract: Systems, apparatuses, and methods for utilizing efficient vectorization techniques for operands in non-sequential memory locations are disclosed. A system includes a vector processing unit (VPU) and one or more memory devices. In response to determining that a plurality of vector operands are stored in non-sequential memory locations, the VPU performs a plurality of vector load operations to load the plurality of vector operands into a plurality of vector registers. Next, the VPU performs a shuffle operation to consolidate the plurality of vector operands from the plurality of vector registers into a single vector register. Then, the VPU performs a vector operation on the vector operands stored in the single vector register. The VPU can also perform a vector store operation by permuting and storing a plurality of vector operands in appropriate locations within multiple vector registers and then storing the vector registers to locations in memory using a mask.
    Type: Grant
    Filed: September 23, 2016
    Date of Patent: July 16, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Anupama Rajesh Rasale, Dibyendu Das, Ashutosh Nema, Md Asghar Ahmad Shahid, Prathiba Kumar
  • Patent number: 10353591
    Abstract: Improvements in compute shader programs executed on parallel processing hardware are disclosed. An application or other entity defines a sequence of shader programs to execute. Each shader program defines inputs and outputs which would, if unmodified, execute as loads and stores to a general purpose memory, incurring high latency. A compiler combines the shader programs into groups that can operate in a lower-latency, but lower-capacity local data store memory. The boundaries of these combined shader programs are defined by several aspects including where memory barrier operations are to execute, whether combinations of shader programs can execute using only the local data store and not the global memory (except for initial reads and writes) and other aspects.
    Type: Grant
    Filed: February 24, 2017
    Date of Patent: July 16, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Michael L. Schmitt, Radhakrishna Giduthuri
  • Patent number: 10355661
    Abstract: A power delivery network, circuit, and method reduce die package resonance of an integrated circuit (IC) die. Decoupling capacitors interact with equivalent series inductances (ESLs) of power conductors within a package carrier substrate create the die package resonance characteristic. In one form, an anti-resonance tuning circuit has a first node conductively coupled to one of the IC die's positive or negative power supply conductors, and a second node conductively coupled directly to a selected conductive structure on the carrier substrate. The anti-resonance tuning circuit includes a tuning capacitor, a tuning inductor, and optionally a dampening resistor coupled in series and having values sufficient to mitigate the die package resonance. In another form, impedance adjustment techniques are provided to connect and tune the anti-resonance tuning circuit to lower an impedance peak.
    Type: Grant
    Filed: August 28, 2018
    Date of Patent: July 16, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Fei Guo
  • Patent number: 10346055
    Abstract: Systems, apparatuses, and methods for performing run-time checking of access uniformity of vector memory access instructions are disclosed. A system includes a vector unit, a scalar unit, and a memory. The system performs a run-time check to determine if two or more threads of a wave have access uniformity to the memory prior to executing a vector memory access instruction for the wave on the vector unit. The system replaces the vector memory access instruction with a group of instructions responsive to determining that two or more threads of the wave have access uniformity to the memory. The group of instructions includes a scalar access instruction to memory followed by a cross-thread data sharing instruction. The scalar access instruction is executed on the scalar unit. Alternatively, the group of instructions can include a vector memory access instruction by only a single thread in each group having access uniformity.
    Type: Grant
    Filed: July 28, 2017
    Date of Patent: July 9, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Guohua Jin
  • Patent number: 10341650
    Abstract: Systems, methods and apparatuses of processing data of a VR system are disclosed that comprise receiving tracking information which includes at least one of user position information and eye gaze point information. One or more processors may be used to predict, based on the user tracking information, a user viewpoint of a next frame of a sequence of frames of video data to be displayed. Using the prediction, a portion of the next frame of video data to be displayed is rendered at an estimated location in the next frame. A corresponding matching portion in a previously encoded frame is determined based on the estimated location of the portion in the next frame and the portion of the next frame of video data is encoded.
    Type: Grant
    Filed: April 15, 2016
    Date of Patent: July 2, 2019
    Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.
    Inventors: Khaled Mammou, Ihab Amer, Gabor Sines, Lei Zhang, Layla A. Mah, Guennadi Riguer, David Glen
  • Patent number: 10340916
    Abstract: An electronic device includes a plurality of hardware functional blocks, the hardware functional blocks being logically grouped into two or more islands, with each island including a different one or more of the hardware functional blocks. A hardware controller in the electronic device is configured to determine a present activity being performed by at least one of the hardware functional blocks. The hardware controller then, based on the present activity, configures supply voltages for the hardware functional blocks in some or all of the islands.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: July 2, 2019
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Thomas J. Gibney, Sridhar V. Gada, Alexander J. Branover, Benjamin Tsien
  • Patent number: 10339063
    Abstract: A processor includes an operations scheduler to schedule execution of operations at, for example, a set of execution units or a cache of the processor. The operations scheduler periodically adds sets of operations to a tracking array, and further identifies when an operation in the tracked set is blocked from execution scheduling in response to, for example, identifying that the operation is dependent on another operation that has not completed execution. The processor further includes a counter that is adjusted each time an operation in the tracking array is blocked from execution, and is reset each time an operation in the tracking array is executed. When the value of the counter exceeds a threshold, the operations scheduler prioritizes the remaining tracked operations for execution scheduling.
    Type: Grant
    Filed: July 19, 2016
    Date of Patent: July 2, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Paul James Moyer, Richard Martin Born
  • Patent number: 10339068
    Abstract: Systems, apparatuses, and methods for implementing a virtualized translation lookaside buffer (TLB) are disclosed herein. In one embodiment, a system includes at least an execution unit and a first TLB. The system supports the execution of a plurality of virtual machines in a virtualization environment. The system detects a translation request generated by a first virtual machine with a first virtual memory identifier (VMID). The translation request is conveyed from the execution unit to the first TLB. The first TLB performs a lookup of its cache using at least a portion of a first virtual address and the first VMID. If the lookup misses in the cache, the first TLB allocates an entry which is addressable by the first virtual address and the first VMID, and the first TLB sends the translation request with the first VMID to a second TLB.
    Type: Grant
    Filed: April 24, 2017
    Date of Patent: July 2, 2019
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Wade K. Smith, Anthony Asaro
  • Patent number: 10339067
    Abstract: A technique for use in a memory system includes swapping a first plurality of pages of a first memory of the memory system with a second plurality of pages of a second memory of the memory system. The first memory has a first latency and the second memory has a second latency. The first latency is less than the second latency. The technique includes updating a page table and triggering a translation lookaside buffer shootdown to associate a virtual address of each of the first plurality of pages with a corresponding physical address in the second memory and to associate a virtual address for each of the second plurality of pages with a corresponding physical address in the first memory.
    Type: Grant
    Filed: June 19, 2017
    Date of Patent: July 2, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Yasuko Eckert, Thiruvengadam Vijayaraghavan, Gabriel H. Loh
  • Publication number: 20190196839
    Abstract: A system and method for increasing address generation operations per cycle is described. In particular, a unified address generation scheduler queue (AGSQ) is a single queue structure which is accessed by first and second pickers in a picking cycle. Picking collisions are avoided by assigning a first set of entries to the first picker and a second set of entries to the second picker. The unified AGSQ uses a shifting, collapsing queue structure to shift other micro-operations into issued entries, which in turn collapses the queue and re-balances the unified AGSQ. A second level and delayed picker picks a third micro-operation that is ready for issue in the picking cycle. The third micro-operation is picked from the remaining entries across the first set of entries and the second set of entries. The third micro-operation issues in a next picking cycle.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Christopher Spence Oliver, Hanbing Liu, Christopher James Burke, Michael D. Achenbach
  • Publication number: 20190196742
    Abstract: A technique for accessing memory in an accelerated processing device coupled to stacked memory dies is provided herein. The technique includes receiving a memory access request from an execution unit and identifying whether the memory access request corresponds to memory cells of the stacked dies that are considered local to the execution unit or non-local. For local accesses, the access is made “directly”, that is, without using a bus. A control die coordinates operations for such local accesses, activating particular through-silicon-vias associated with the memory cells that include the data for the access. Non-local accesses are made via a distributed cache fabric and an interconnect bus in the control die. Various other features and details are provided below.
    Type: Application
    Filed: December 21, 2017
    Publication date: June 27, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Dmitri Yudanov, Jiasheng Chen
  • Publication number: 20190196978
    Abstract: A data processing system includes a memory and an input output memory management unit that is connected to the memory. The input output memory management unit is adapted to receive batches of address translation requests. The input output memory management unit has instructions that identify, from among the batches of address translation requests, a later batch having a lower number of memory access requests than an earlier batch, and selectively schedules access to a page table walker for each address translation request of a batch.
    Type: Application
    Filed: December 22, 2017
    Publication date: June 27, 2019
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Eric Van Tassell, Mark Oskin, Guilherme Cox, Gabriel Loh