Patents Assigned to Advanced Micros Devices, Inc.
-
Publication number: 20200210341Abstract: Embodiments include methods, systems and non-transitory computer-readable computer readable media including instructions for executing a prefetch kernel with reduced intermediate state storage resource requirements. These include executing a prefetch kernel on a graphics processing unit (GPU), such that the prefetch kernel begins executing before a processing kernel. The prefetch kernel performs memory operations that are based upon at least a subset of memory operations in the processing kernel.Type: ApplicationFiled: March 9, 2020Publication date: July 2, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Nuwan S. Jayasena, James Michael O'Connor, Michael Mantor
-
Patent number: 10698691Abstract: Disclosed are a method and a processing device directed to determining global branch history for branch prediction. The method includes shifting first bits of a branch signature into a current global branch history and performing a bitwise exclusive-or (XOR) function on second bits of the branch signature and shifted bits of the current global branch history. In this way, the current global branch history is updated. The processing device implements the method using a shift logic configured to store and shift bits representing a current global branch history, a register configured to store the current global branch history, decision circuitry configured to determine whether or not a branch is taken, and XOR gates.Type: GrantFiled: August 30, 2016Date of Patent: June 30, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Steven R. Havlir
-
Patent number: 10698692Abstract: An asynchronous pipeline includes a first stage and one or more second stages. A controller provides control signals to the first stage to indicate a modification to an operating speed of the first stage. The modification is determined based on a comparison of a completion status of the first stage to one or more completion statuses of the one or more second stages. In some cases, the controller provides control signals indicating modifications to an operating voltage applied to the first stage and a drive strength of a buffer in the first stage. Modules can be used to determine the completion statuses of the first stage and the one or more second stages based on the monitored output signals generated by the stages, output signals from replica critical paths associated with the stages, or a lookup table that indicates estimated completion times.Type: GrantFiled: July 21, 2016Date of Patent: June 30, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Greg Sadowski, John Kalamatianos, Shomit N. Das
-
Patent number: 10700954Abstract: A system includes a multi-core processor that includes a scheduler. The multi-core processor communicates with a system memory and an operating system. The multi-core processor executes a first process and a second process. The system uses the scheduler to control a use of a memory bandwidth by the second process until a current use in a control cycle by the first process meets a first setpoint of use for the first process when the first setpoint is at or below a latency sensitive (LS) floor or a current use in the control cycle by the first process exceeds the LS floor when the first setpoint exceeds the LS floor.Type: GrantFiled: December 20, 2017Date of Patent: June 30, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Douglas Benson Hunt, Jay Fleischman
-
Patent number: 10698856Abstract: A link controller, method, and data processing platform are provided with dual-protocol capability. The link controller includes a physical layer circuit for providing a data lane over a communication link, a first data link layer controller which operates according to a first protocol, and a second data link layer controller which operates according to a second protocol. A multiplexer/demultiplexer selectively connects both data link layer controllers to the physical layer circuit. A link training and status state machine (LTSSM) selectively controls the physical layer circuit to transmit and receive first training ordered sets over the data lane, and inside the training ordered sets, transmit and receive alternative protocol negotiation information over the data lane. In response to receiving the alternative protocol negotiation information, the LTSSM causes the multiplexer/demultiplexer to selectively connect the physical layer circuit to the second data link layer controller.Type: GrantFiled: December 18, 2018Date of Patent: June 30, 2020Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Gordon Caruk, Gerald R. Talbot
-
Patent number: 10699464Abstract: Methods for enabling graphics features in processors are described herein. Methods are provided to enable trinary built-in functions in the shader, allow separation of the graphics processor's address space from the requirement that all textures must be physically backed, enable use of a sparse buffer allocated in virtual memory, allow a reference value used for stencil test to be generated and exported from a fragment shader, provide support for use specific operations in the stencil buffers, allow capture of multiple transform feedback streams, allow any combination of streams for rasterization, allow a same set of primitives to be used with multiple transform feedback streams as with a single stream, allow rendering to be directed to layered framebuffer attachments with only a vertex and fragment shader present, and allow geometry to be directed to one of an array of several independent viewport rectangles without a geometry shader.Type: GrantFiled: October 31, 2016Date of Patent: June 30, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Graham Sellers, Eric Zolnowski, Pierre Boudier, Juraj Obert
-
Patent number: 10699033Abstract: Systems, apparatuses, and methods for secure enablement of platform features without user intervention are disclosed. In one embodiment, a system includes at least a motherboard and a processor. The motherboard includes at least a socket and an authentication component. The authentication component can be a chipset, expansion I/O device, or other component. The processor is installed in the socket on the motherboard. During a boot sequence, the processor retrieves a key value from the authentication component and then authenticates the key value. Next, the processor determines which one or more features to enable based on the key value. Then, the processor programs one or more feature control registers to enable the one or more features specified by the key value. Accordingly, during normal operation of the system, the one or more features will be enabled.Type: GrantFiled: June 28, 2017Date of Patent: June 30, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Mahesh Subramony, Daniel L. Bouvier
-
Patent number: 10698472Abstract: A heterogeneous processor system includes a first processor implementing an instruction set architecture (ISA) including a set of ISA features and configured to support a first subset of the set of ISA features. The heterogeneous processor system also includes a second processor implementing the ISA including the set of ISA features and configured to support a second subset of the set of ISA features, wherein the first subset and the second subset of the set of ISA features are different from each other. When the first subset includes an entirety of the set of ISA features, the lower-feature second processor is configured to execute an instruction thread by consuming less power and with lower performance than the first processor.Type: GrantFiled: October 27, 2017Date of Patent: June 30, 2020Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Elliot H. Mednick, Edward McLellan
-
Publication number: 20200202605Abstract: A technique for determining the centroid for fragments generated using variable rate shading is provided. Because the barycentric interpolation used to determine texture coordinates for pixels is based on the premise that the point being interpolated is within the triangle, centroids that are outside of the triangle can produce undesirable visual artifacts. Another concern, however, is that the further the centroid is from the center of a pixel, the less accurate quad-based pixel derivatives become for attributes of that pixel. To address these concerns, the position of the sample that is both covered by the triangle and the closest to the center of the pixel, out of all covered samples of the pixel, is used as the centroid for a partially covered pixel. For a fully covered pixel (all samples in a pixel are covered by a triangle), the center of that pixel is used as the centroid.Type: ApplicationFiled: December 19, 2018Publication date: June 25, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Skyler Jonathon Saleh, Pazhani Pillai
-
Publication number: 20200201763Abstract: Improvements to traditional schemes for storing data for processing tasks and for executing those processing tasks are disclosed. A set of data for which processing tasks are to be executed is processed through a hierarchy to distribute the data through various elements of a computer system. Levels of the hierarchy represent different types of memory or storage elements. Higher levels represent coarser portions of memory or storage elements and lower levels represent finer portions of memory or storage elements. Data proceeds through the hierarchy as “tasks” at different levels. Tasks at non-leaf nodes comprise tasks to subdivide data for storage in the finer granularity memories or storage units associated with a lower hierarchy level. Tasks at leaf nodes comprise processing work, such as a portion of a calculation. Two techniques for organizing the tasks in the hierarchy presented herein include a queue-based technique and a graph-based technique.Type: ApplicationFiled: March 2, 2020Publication date: June 25, 2020Applicant: Advanced Micro Devices, Inc.Inventor: Shuai Che
-
Publication number: 20200202594Abstract: A technique for performing rasterization and pixel shading with decoupled resolution is provided herein. The technique involves performing rasterization as normal to generate quads. The quads are accumulated into a tile buffer. A shading rate is determined for the contents of the tile buffer. If the shading rate is a sub-sampling shading rate, then the quads in the tile buffer are down-sampled, which reduces the amount of work to be performed by a pixel shader. The shaded down-sampled quads are then restored to the resolution of the render target. If the shading rate is a super-sampling shading rate, then the quads in the tile buffer are up-sampled. The results of the shaded down-sampled or up-sampled quads are written to the render target.Type: ApplicationFiled: December 20, 2018Publication date: June 25, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Skyler Jonathon Saleh, Andrew S. Pomianowski
-
Patent number: 10691772Abstract: A method includes storing a sparse triangular matrix as a compressed sparse row (CSR) dataset. For each factor of a plurality of factors in a first vector, a value of the factor is calculated by identifying for the factor a set of one or more antecedent factors in the first vector, where the value of the factor is dependent on each of the one or more antecedent factors. In response to a completion array indicating that all of the one or more antecedent factor values are solved, the value of the factor is calculated based on one or more elements in a row of the matrix and a product value corresponding to the row. In the completion array, a first completion flag for the factor is asserted, indicating that the factor is solved.Type: GrantFiled: April 20, 2018Date of Patent: June 23, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Joseph Lee Greathouse
-
Patent number: 10692545Abstract: Systems, apparatuses, and methods for performing efficient data transfer in a computing system are disclosed. A termination voltage generator includes an inverter-based chopper circuit, which uses a first group of an even number of serially connected inverters coupled between the output node of the chopper circuit and the gate terminal of an output pmos transistor. Additionally, a second group of an even number of serially connected inverters is coupled between the output node and the gate terminal of an output nmos transistor. A replica inverter includes two serially connected pmos transistors and two serially connected nmos transistors. Each of one pmos transistor and one nmos transistor receives a generated voltage set as the expected value of the termination voltage. Each of the other pmos transistor and nmos transistor receives an output based on a comparison between the expected value to the output of the replica inverter.Type: GrantFiled: September 24, 2018Date of Patent: June 23, 2020Assignee: Advanced Micro Devices, Inc.Inventors: Milam Paraschou, Balwinder Singh, Gerald R. Talbot, Alushulla Jack Ambundo, Edoardo Prete, Thomas H. Likens, III, Michael A. Margules
-
Patent number: 10692271Abstract: A technique for classifying a ray tracing intersection with a triangle edge or vertex avoids either rendering holes or multiple hits of the same ray for different triangles. The technique employs a tie-breaking scheme in which certain types of edges are classified as hits and certain types of edges are classified as misses. The test is performed in a coordinate space that comprises a projection into the viewspace of the ray, and thus where the ray direction has a non-zero magnitude in one axis (e.g., z) but a zero magnitude in the two other axes. In this coordinate space, edges are classified as one of top, bottom, left, and right, and an intersection on an edge counts as a hit if the intersection hits a top or left edge, but a miss if the intersection hits a bottom or right edge. Vertices are processed in a related manner.Type: GrantFiled: December 13, 2018Date of Patent: June 23, 2020Assignee: Advanced Micro Devices, Inc.Inventor: Skyler Jonathon Saleh
-
Publication number: 20200193681Abstract: Described herein is a technique for performing ray tracing. According to this technique, instead of executing intersection and/or any hit shaders during traversal of an acceleration structure to determine the closest hit for a ray, an acceleration structure is fully traversed in an invocation of a shader program, and the closest intersection with a triangle is recorded in a data structure associated with the material of the triangle. Later, a scheduler launches waves by grouping together multiple data items associated with the same material. The rays processed by that wave are processed with a continuation ray, rather than the full original ray. A continuation ray starts from the previous point of intersection and extends in the direction of the original ray. These steps help counter divergence that would occur if a single shader program that inlined the intersection and any hit shaders were executed.Type: ApplicationFiled: December 13, 2018Publication date: June 18, 2020Applicant: Advanced Micro Devices, Inc.Inventor: Skyler Jonathon Saleh
-
Publication number: 20200193684Abstract: Described herein is a technique for performing ray-triangle intersection without a floating point division unit. A division unit would be useful for a straightforward implementation of a certain type of ray-triangle intersection test that is useful in ray tracing operations. This certain type of ray-triangle intersection test includes a step that transforms the coordinate system into the viewspace of the ray, thereby reducing the problem of intersection to one of 2D triangle rasterization. However, a straightforward implementation of this transformation requires floating point division, as the transformation utilizes a shear operation to set the coordinate system such that the magnitudes of the ray direction on two of the axes are zero. Instead of using the most straightforward implementation of this transform, the technique described herein scales the entire coordinate system by the magnitude of the ray direction in the axis that is the denominator of the shear ratio, removing division.Type: ApplicationFiled: December 13, 2018Publication date: June 18, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Skyler Jonathon Saleh, Ruijin Wu
-
Publication number: 20200193683Abstract: A technique for classifying a ray tracing intersection with a triangle edge or vertex avoids either rendering holes or multiple hits of the same ray for different triangles. The technique employs a tie-breaking scheme in which certain types of edges are classified as hits and certain types of edges are classified as misses. The test is performed in a coordinate space that comprises a projection into the viewspace of the ray, and thus where the ray direction has a non-zero magnitude in one axis (e.g., z) but a zero magnitude in the two other axes. In this coordinate space, edges are classified as one of top, bottom, left, and right, and an intersection on an edge counts as a hit if the intersection hits a top or left edge, but a miss if the intersection hits a bottom or right edge. Vertices are processed in a related manner.Type: ApplicationFiled: December 13, 2018Publication date: June 18, 2020Applicant: Advanced Micro Devices, Inc.Inventor: Skyler Jonathon Saleh
-
Publication number: 20200192850Abstract: A link controller, method, and data processing platform are provided with dual-protocol capability. The link controller includes a physical layer circuit for providing a data lane over a communication link, a first data link layer controller which operates according to a first protocol, and a second data link layer controller which operates according to a second protocol. A multiplexer/demultiplexer selectively connects both data link layer controllers to the physical layer circuit. A link training and status state machine (LTSSM) selectively controls the physical layer circuit to transmit and receive first training ordered sets over the data lane, and inside the training ordered sets, transmit and receive alternative protocol negotiation information over the data lane. In response to receiving the alternative protocol negotiation information, the LTSSM causes the multiplexer/demultiplexer to selectively connect the physical layer circuit to the second data link layer controller.Type: ApplicationFiled: December 18, 2018Publication date: June 18, 2020Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Gordon Caruk, Gerald R. Talbot
-
Publication number: 20200192852Abstract: An interconnect controller includes a data link layer controller coupled to a transaction layer, wherein the data link layer controller selectively receives data packets from and sends data packets to the transaction layer, and a physical layer controller coupled to the data link layer controller and to a communication link. The physical layer controller selectively operates at a first predetermined link speed. The physical layer controller has an enhanced speed mode, wherein in response to performing a link initialization, the interconnect controller queries a data processing platform to determine whether the enhanced speed mode is permitted, performs at least one setup operation to select an enhanced speed, wherein the enhanced speed is greater than the first predetermined link speed, and subsequently operates the communication link using the enhanced speed.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Gordon Caruk, Gerald R. Talbot
-
Publication number: 20200192671Abstract: A processing device is provided which includes memory and at least one processor. The memory includes main memory and cache memory in communication with the main memory via a link. The at least one processor is configured to receive a request for a cache line and read the cache line from main memory. The at least one processor is also configured to compress the cache line according to a compression algorithm and, when the compressed cache line includes at least one byte predicted not to be accessed, drop the at least one byte from the compressed cache line based on whether the compression algorithm is determined to successfully compress the cache line according to a compression parameter.Type: ApplicationFiled: December 14, 2018Publication date: June 18, 2020Applicant: Advanced Micro Devices, Inc.Inventors: Shomit N. Das, Kishore Punniyamurthy, Matthew Tomei, Bradford M. Beckmann