Patents Assigned to Advanced Micro Devices
-
Publication number: 20180211435Abstract: Improvements in the graphics processing pipeline that allow multiple pipelines to cooperate to render a single frame are disclosed. Two approaches are provided. In a first approach, world-space pipelines for the different graphics processing pipelines process all work for draw calls received from a central processing unit (CPU). In a second approach, the world-space pipelines divide up the work. Work that is divided is synchronized and redistributed at various points in the world-space pipeline. In either approach, the triangles output by the world-space pipelines are distributed to the screen-space pipelines based on the portions of the render surface overlapped by the triangles. Triangles are rendered by screen-space pipelines associated with the render surface portions overlapped by those triangles.Type: ApplicationFiled: January 26, 2017Publication date: July 26, 2018Applicant: Advanced Micro Devices, Inc.Inventors: Mangesh P. Nijasure, Todd Martin, Michael Mantor
-
Publication number: 20180211434Abstract: Techniques for generating a stereo image from a single set of input geometry in a three-dimensional rendering pipeline are disclosed. Vertices are processed through the end of the world-space pipeline. In the primitive assembler, at the end of the world-space pipeline, before perspective division, each clip-space vertex is duplicated. The primitive assembler generates this duplicated clip-space vertex using the y, z, and w coordinates of the original vertex and based on an x coordinate that is offset in the x-direction in clip-space as compared with the x coordinate of the original vertex. Both the original vertex clip-space vertex and the modified clip-space vertex are then sent through the rest of the pipeline for processing, including perspective division, viewport transform, rasterization, pixel shading, and other operations. The result is that a single set of input vertices is rendered into a stereo image.Type: ApplicationFiled: January 25, 2017Publication date: July 26, 2018Applicant: Advanced Micro Devices, Inc.Inventors: Mangesh P. Nijasure, Michael Mantor, Jeffrey M. Smith
-
Patent number: 10032308Abstract: A shader in a graphics pipeline accesses an object that represents a portion of a model of a scene in object space and one or more far-z values that indicate a furthest distance of a previously rendered portion of one or more tiles from a viewpoint used to render the scene on a screen. The one or more tiles overlap a bounding box of the object in a plane of the screen. The shader culls the object from the graphics pipeline in response to the one or more far-z values being smaller than a near-z value that represents a closest distance of a portion of the object to the viewpoint.Type: GrantFiled: June 22, 2016Date of Patent: July 24, 2018Assignee: Advanced Micro Devices, Inc.Inventors: Timour T. Paltashev, Chris Brennan
-
Patent number: 10031947Abstract: A method and apparatus for performing a top-down Breadth-First Search (BFS) includes performing a first determination whether to convert to a bottom-up BFS. A second determination is performed whether to convert to the bottom-up BFS, based upon the first determination being positive. The bottom-up BFS is performed, based upon the first determination and the second determination being positive. A third determination is made whether to convert from the bottom-up BFS to the top-down BFS, based upon the third determination being positive.Type: GrantFiled: June 24, 2015Date of Patent: July 24, 2018Assignee: ADVANCED MICRO DEVICES, INC.Inventor: Mayank Daga
-
Patent number: 10025721Abstract: The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.Type: GrantFiled: October 24, 2014Date of Patent: July 17, 2018Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Vydhyanathan Kalyanasundharam, Philip Ng, Maggie Chan, Vincent Cueva, Anthony Asaro, Jimshed Mirza, Greggory D. Donley, Bryan Broussard, Benjamin Tsien, Yaniv Adiri
-
Patent number: 10025605Abstract: A receiving node in a computer system that includes a plurality of types of execution units receives an active message from a sending node. The receiving node compiles an intermediate language message handler corresponding to the active message into a machine instruction set architecture (ISA) message handler and the receiver executes the ISA message handler on a selected one of the execution units. If the active message handler is not available at the receiver, the sender sends an intermediate language version of the message handler to the receiving node. The execution unit selected to execute the message handler is chosen based on a field in the active message or on runtime criteria in the receiving system.Type: GrantFiled: April 8, 2016Date of Patent: July 17, 2018Assignee: Advanced Micro Devices, Inc.Inventors: Shuai Che, Marc S. Orr
-
Patent number: 10025361Abstract: A method includes controlling active frequency states of a plurality of heterogeneous processing units based on frequency sensitivity metrics indicating performance coupling between different types of processing units in the plurality of heterogeneous processing units. A processor includes a plurality of heterogeneous processing units and a performance controller to control active frequency states of the plurality of heterogeneous processing units based on frequency sensitivity metrics indicating performance coupling between different types of processing units in the plurality of heterogeneous processing units. The active frequency state of a first type of processing unit in the plurality of heterogeneous processing units is controlled based on a first activity metric associated with a first type of processing unit and a second activity metric associated with a second type of processing unit.Type: GrantFiled: June 5, 2014Date of Patent: July 17, 2018Assignee: Advanced Micro Devices, Inc.Inventors: Indrani Paul, Vignesh Trichy Ravi, Manish Arora, Srilatha Manne
-
Patent number: 10019365Abstract: Enhanced adaptive profiling of ranges of values in a stream of events includes identifying a set of contiguous ranges of the values and corresponding access frequencies in the stream of events. The enhanced adaptive profiling uses a merge threshold value and a split threshold value. The set of contiguous ranges spans an entire range space of the values. Periodic traversal of the set of contiguous ranges of values and corresponding access frequencies identifies a target set of ranges of the values having corresponding access frequencies above a predetermined threshold access frequency. The target set of ranges of values has a total number of ranges less than or equal to a predetermined number of ranges. The target ranges of values span at least some of the entire range space of values. A first operation uses the target set of ranges of values.Type: GrantFiled: April 15, 2016Date of Patent: July 10, 2018Assignee: Advanced Micro Devices, Inc.Inventor: Mauricio Breternitz
-
Patent number: 10019283Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.Type: GrantFiled: June 22, 2015Date of Patent: July 10, 2018Assignee: Advanced Micro Devices, Inc.Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
-
Patent number: 10019829Abstract: Methods for enabling graphics features in processors are described herein. Methods are provided to enable trinary built-in functions in the shader, allow separation of the graphics processor's address space from the requirement that all textures must be physically backed, enable use of a sparse buffer allocated in virtual memory, allow a reference value used for stencil test to be generated and exported from a fragment shader, provide support for use specific operations in the stencil buffers, allow capture of multiple transform feedback streams, allow any combination of streams for rasterization, allow a same set of primitives to be used with multiple transform feedback streams as with a single stream, allow rendering to be directed to layered framebuffer attachments with only a vertex and fragment shader present, and allow geometry to be directed to one of an array of several independent viewport rectangles without a geometry shader.Type: GrantFiled: June 7, 2013Date of Patent: July 10, 2018Assignee: Advanced Micro Devices, Inc.Inventors: Graham Sellers, Pierre Boudier, Juraj Obert
-
Patent number: 10019377Abstract: The described embodiments include a computing device with two or more types of processors and a memory that is shared between the two or more types of processors. The computing device performs operations for handling cache coherency between the two or more types of processors. During operation, the computing device sets a cache coherency indicator in metadata in a page table entry in a page table, the page table entry information about a page of data that is stored in the memory. The computing device then uses the cache coherency indicator to determine operations to be performed when accessing data in the page of data in the memory. For example, the computing device can use the coherency indicator to determine whether a coherency operation is to be performed when a processor of a given type accesses data in the page of data in the memory.Type: GrantFiled: May 23, 2016Date of Patent: July 10, 2018Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Arkaprava Basu, Bradford M. Beckmann, Shuai Che, Sooraj Puthoor
-
Patent number: 10013240Abstract: A first processing element is configured to execute a first thread and one or more second processing elements are configured to execute one or more second threads that are redundant to the first thread. The first thread and the one or more second threads are to selectively bypass one or more comparisons of results of operations performed by the first thread and the one or more second threads depending on whether an event trigger for the comparison has occurred a configurable number of times since a previous comparison of previously encoded values of the results. In some cases the comparison can be performed based on hashed (or encoded) values of the results of a current operation and one or more previous operations.Type: GrantFiled: June 21, 2016Date of Patent: July 3, 2018Assignee: Advanced Micro Devices, Inc.Inventor: Daniel I. Lowell
-
Publication number: 20180181306Abstract: A method and apparatus of copying data from a first memory location to a second memory location includes performing a copy operation selected out of one or more copy operations. The copy operations include performing interleaved data copying, performing a full wavefront copy operation, copying all data to a local data store (LDS) prior to copying to the second memory location, or pipelining the data for copying. The copy operation is applied to copy the data from the first location to the second memory location.Type: ApplicationFiled: December 22, 2016Publication date: June 28, 2018Applicant: Advanced Micro Devices, Inc.Inventors: Guohua Jin, Todd Martin
-
Publication number: 20180181341Abstract: Described herein is a method and system for directly accessing and transferring data between a first memory architecture and a second memory architecture associated with a graphics processing unit (GPU) by treating the first memory architecture, the second memory architecture and system memory as a single physical memory, where the first memory architecture is a non-volatile memory (NVM) and the second memory architecture is a local memory. Upon accessing a virtual address (VA) range by a processor, the requested content is paged in from the single physical memory and is then redirected by a virtual storage driver to the second memory architecture or the system memory, depending on which of the GPU or CPU triggered the access request. The memory transfer occurs without awareness of the application and the operating system.Type: ApplicationFiled: December 23, 2016Publication date: June 28, 2018Applicant: Advanced Micro Devices, Inc.Inventor: Paul Blinzer
-
Publication number: 20180181492Abstract: Described herein are waterfall counters and an application to architectural vulnerability factor (AVF) estimation. Waterfall counters count events that are generated at event generation logic. The waterfall counters are a combination of small, fast counters local to the event generation logic, and larger, global counters in fast memory. The local counters can be saturation or oscillation counters. When a local counter is saturated or evicted, the value from the local counter is added to the global counter. This addition can be done using logic local to the local or global counter. The waterfall counters provide a full-accuracy event count without the high bandwidth that is needed to maintain the global counters. An AVF estimation can be determined based on ratios from counts of read events, write events, and total events using the waterfall counters.Type: ApplicationFiled: December 23, 2016Publication date: June 28, 2018Applicant: Advanced Micro Devices, Inc.Inventors: Manish Gupta, Vilas Sridharan, David A. Roberts
-
Publication number: 20180181488Abstract: Techniques for performing cache invalidates and write-backs in an accelerated processing device (e.g., a graphics processing device that renders three-dimensional graphics) are disclosed. The techniques involve receiving requests from a “master” (e.g., the central processing unit). The techniques involve invalidating virtual-to-physical address translations in an address translation request. The techniques include splitting up the requests based on whether the requests target virtually or physically tagged caches. Addresses for the portions of a request that target physically tagged caches are translated using invalidated virtual-to-physical address translations for speed. The split up request is processed to generate micro-transactions for individual caches targeted by the request. Micro-transactions for physically and virtually tagged caches are processed in parallel. Once all micro-transactions for a request have been processed, the unit that made the request is notified.Type: ApplicationFiled: December 23, 2016Publication date: June 28, 2018Applicants: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Mark Fowler, Jimshed Mirza, Anthony Asaro
-
Publication number: 20180181496Abstract: Methods, devices, and systems for determining an address in a physical memory which corresponds to a virtual address using a skewed-associative translation lookaside buffer (TLB) are described. A virtual address and a configuration indication are received using receiver circuitry. A physical address corresponding to the virtual address is output if a TLB hit occurs. A first subset of a plurality of ways of the TLB is configured to hold a first page size. The first subset includes a number of the ways based on the configuration indication. A physical address corresponding to the virtual address is retrieved from a page table if a TLB miss occurs, and at least a portion of the physical address is installed in a least recently used way of a subset of a plurality of ways the TLB, determined according to a replacement policy based on the configuration indication.Type: ApplicationFiled: December 23, 2016Publication date: June 28, 2018Applicant: Advanced Micro Devices, Inc.Inventors: John M. King, Michael T. Clark
-
Patent number: 10007464Abstract: Described herein is a method and system for directly accessing and transferring data between a first memory architecture and a second memory architecture associated with a graphics processing unit (GPU) by treating the first memory architecture, the second memory architecture and system memory as a single physical memory, where the first memory architecture is a non-volatile memory (NVM) and the second memory architecture is a local memory. Upon accessing a virtual address (VA) range by a processor, the requested content is paged in from the single physical memory and is then redirected by a virtual storage driver to the second memory architecture or the system memory, depending on which of the GPU or CPU triggered the access request. The memory transfer occurs without awareness of the application and the operating system.Type: GrantFiled: December 23, 2016Date of Patent: June 26, 2018Assignee: Advanced Micro Devices, Inc.Inventor: Paul Blinzer
-
Patent number: 10008259Abstract: A system and method for efficient power, performance and stability tradeoffs of memory accesses are described. A memory includes an array of cells for storing data and a sense amplifier for controlling access to the array. The cells receive word line inputs for data access driven by a first voltage supply. The sense amplifier includes first precharge logic, which receives a first precharge input driven by the first power supply used by the array. Therefore, the first precharge input has similar timing characteristics as the word line input used in the array. The sense amplifier includes second precharge logic, which receives a second precharge input driven by a second power supply not used by the array and provides precharged values on bit lines driven by the second power supply.Type: GrantFiled: December 7, 2016Date of Patent: June 26, 2018Assignee: Advanced Micro Devices, Inc.Inventor: Ryan Thomas Freese
-
Publication number: 20180165096Abstract: A system and method for using an operation (op) cache is disclosed. The system and method include an op cache for caching previously decoded instructions. The op cache includes a plurality of physically indexed and tagged instructions allowing sharing of instructions between threads. The op cache is chained through multiple ways allowing service of a plurality of instructions in a cache line. The op cache is stored between a shared operation storage and immediate/displacement storage to maximize capacity.Type: ApplicationFiled: December 9, 2016Publication date: June 14, 2018Applicant: Advanced Micro Devices, Inc.Inventor: David N. Suggs