Patents Assigned to Advanced Micro Devices
-
Patent number: 9639488Abstract: Embodiments are described for a method of reducing power consumption in source synchronous bus systems by reducing signal transitions in the system. Instead of sending clock and data valid signals, only the start and end of valid data packets are marked by clock signal transitions, or only a number of clock pulses that corresponds to number of data words is sent, or only a number transitions on clock signals are sent. The clock signal transitions may comprise either clock pulses or exclusively rising edge or falling edge transitions of the clock signal.Type: GrantFiled: June 20, 2014Date of Patent: May 2, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Gregory Sadowski, Sudha Thiruvengadam, Arun Iyer
-
Patent number: 9639280Abstract: The disclosed embodiments provide a system for processing a memory command on a computer system. During operation, a command scheduler executing on a memory controller of the computer system obtains a predicted latency of the memory command based on a memory address to be accessed by the memory command. Next, the command scheduler orders the memory command with other memory commands in a command queue for subsequent processing by a memory resource on the computer system based on the predicted latency of the memory command.Type: GrantFiled: June 18, 2015Date of Patent: May 2, 2017Assignee: ADVANCED MICRO DEVICES, INC.Inventor: David A. Roberts
-
Patent number: 9639495Abstract: A controller integrated in a memory physical layer interface (PHY) can be used to control training used to configure the memory PHY for communication with an associated external memory such as a dynamic random access memory (DRAM), thereby removing the need to provide training sequences over a data pipeline between a BIOS and the memory PHY. For example, a controller integrated in the memory PHY can control read training and write training of the memory PHY for communication with the external memory based on a training algorithm. The training algorithm may be a seedless training algorithm that converges on a solution for a timing delay and a voltage offset between the memory PHY and the external memory without receiving, from a basic input/output system (BIOS), seed information that characterizes a signal path traversed by training sequences or commands generated by the training algorithm.Type: GrantFiled: June 27, 2014Date of Patent: May 2, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Glenn A. Dearth, Gerry Talbot, Anwar Kashem, Edoardo Prete, Brian Amick
-
Patent number: 9639359Abstract: Embodiments are described for a method for compiling instruction code for execution in a processor having a number of functional units by determining a thermal constraint of the processor, and defining instruction words comprising both real instructions and one or more no operation (NOP) instructions to be executed by the functional units within a single clock cycle, wherein a number of NOP instructions executed over a number of consecutive clock cycles is configured to prevent exceeding the thermal constraint during execution of the instruction code.Type: GrantFiled: May 21, 2013Date of Patent: May 2, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Yuan Xie, Junli Gu
-
Patent number: 9639140Abstract: A method of managing power state transitions for an interactive workload includes storing one or more parameters, each representing an electrical operating characteristic that controls power consumption of the processing unit, receiving a first user input requesting execution of a task by the processing unit, in response to receiving a second user input, modifying at least one of the one or more parameters, and executing the task in the processing unit while operating the processing unit according to the at least one modified parameter.Type: GrantFiled: September 17, 2015Date of Patent: May 2, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Leonardo de Paula Rosa Piga, Mauricio Breternitz
-
Patent number: 9632848Abstract: A system and method for allocating commands in processing is disclosed. The system and method includes an application running on a computer system that provides commands to be executed on one of a plurality of processors capable of executing the commands, the commands provided through an application programming interface, a device driver that buffers the streamed commands and converts the streamed commands into a format used by a GPU, and an operating system that builds a command buffer by grouping a plurality of converted commands based on an allocation for an available processor, wherein the available processor is determined in the interface between the device driver and the operating system. The available processor is one of the plurality of processors capable of executing the commands that receives the command buffer from the operating system, queues the command buffer and performs an asynchronous submission of the command buffer to the GPU, and the GPU executes the command buffer.Type: GrantFiled: December 29, 2015Date of Patent: April 25, 2017Assignee: ADVANCED MICRO DEVICES, INC.Inventors: David Oldcorn, Timour T. Paltashev
-
Patent number: 9626190Abstract: The present invention provides a method and apparatus for floating-point register caching. One embodiment of the method includes mapping a first set of architected registers defined by a first instruction set to a memory outside of a plurality of physical registers. The plurality of physical registers are configured to map to the first set, a second set of architected registers defined by a second construction set, and a set of rename registers. This embodiment of the method also includes adding the physical registers corresponding to the first set of architected registers to the set of rename registers.Type: GrantFiled: October 7, 2010Date of Patent: April 18, 2017Assignee: Advanced Micro Devices, Inc.Inventor: Jeff Rupley
-
Patent number: 9626428Abstract: A system and method for accessing a hash table are provided. A hash table includes buckets where each bucket includes multiple chains. When a single instruction multiple data (SIMD) processor receives a group of threads configured to execute a key look-up instruction that accesses an element in the hash table, the threads executing on the SIMD processor identify a bucket that stores a key in the key look-up instruction. Once identified, the threads in the group traverse the multiple chains in the bucket, such that the elements at a chain level in the multiple chains are traversed in parallel. The traversal continues until a key look-up succeeds or fails.Type: GrantFiled: September 11, 2013Date of Patent: April 18, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Mithuna Thottethodi, Steven Reinhardt
-
Patent number: 9625938Abstract: A technique implements differential digital logic circuits with a differential clock distribution network using standard cell differential clock gater circuits to reduce area, delay, power consumption in integrated circuits. An apparatus includes a first terminal configured to receive a clock signal, a second terminal configured to receive a complementary clock signal, and a third terminal configured to receive a clock control signal. The apparatus includes a latch circuit configured to generate a latched version of the clock control signal based on a version of the clock control signal, a version of the clock signal, and a version of the complementary clock signal. The apparatus includes a combinatorial circuit configured to generate a gated clock signal and a gated complementary clock signal based on the version of the clock control signal, the version of the clock signal, and the version of the complementary clock signal.Type: GrantFiled: March 25, 2015Date of Patent: April 18, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Hariprasad Thodukattil Thazhatheppattu, Prasant Vallur, Animesh Sharma
-
Patent number: 9626789Abstract: Embodiments are described for a method for processing textures for a mesh comprising quadrilateral polygons for real-time rendering of an object or model in a graphics processing unit (GPU), comprising associating an independent texture map with each face of the mesh to produce a plurality of face textures, packing the plurality of face textures into a single texture atlas, wherein the atlas is divided into a plurality of blocks based on a resolution of the face textures, adding a border to the texture map for each face comprising additional texels including at least border texels from an adjacent face texture map, and performing linear interpolation of a trilinear filtering operation on the face textures to resolve resolution discrepancies caused when crossing an edge of a polygon.Type: GrantFiled: May 7, 2013Date of Patent: April 18, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Karl E. Hillesland, Justin Hensley
-
Patent number: 9626243Abstract: A method and device for error detection includes performing error detection for each data word received in a burst access to a memory. When no error is detected, the data words are written to a cache and indicated as valid data. In response to detecting an error in a data word, the error is corrected and the corrected data written to the cache without indicating the data as valid. In addition, the location of the detected error, indicating the data symbol associated with the error, is recorded in an error vector. The error vectors associated with each data word in the burst access are compared to determine whether a detected error was properly corrected.Type: GrantFiled: December 11, 2009Date of Patent: April 18, 2017Assignee: Advanced Micro Devices, Inc.Inventor: James O. Nicholson
-
Publication number: 20170102971Abstract: The methods and apparatus can assign processing core workloads to processing cores from a heterogeneous instruction set architectures (ISA) pool of available processing cores based on processing core metric results. For example, the method and apparatus can obtain processing core metric results for one or more processing cores, such as processing cores within general purpose processors, from a heterogeneous ISA pool of available processing cores. The method and apparatus can also obtain one or more processing core workloads, such as software applications or software processes, from a pool of available processing core workloads to be assigned. The method and apparatus can then assign one or more processing core workloads that have higher priority than others from the pool of available processing core workloads to a processing core from the heterogeneous ISA pool of available processing cores based on its processing core metric result.Type: ApplicationFiled: October 12, 2015Publication date: April 13, 2017Applicant: ADVANCED MICRO DEVICES, INC.Inventor: Sergey Blagodurov
-
Publication number: 20170102886Abstract: Methods, systems, and computer program products are provided for minimizing latency in a implementation where a peripheral device is used as a capture device and a compute device such as a GPU processes the captured data in a computing environment. In embodiments, a peripheral device and GPU are tightly integrated and communicate at a hardware/firmware level. Peripheral device firmware can determine and store compute instructions specifically for the GPU, in a command queue. The compute instructions in the command queue are understood and consumed by firmware of the GPU. The compute instructions include but are not limited to generating low latency visual feedback for presentation to a display screen, and detecting the presence of gestures to be converted to OS messages that can be utilized by any application.Type: ApplicationFiled: December 21, 2016Publication date: April 13, 2017Applicant: Advanced Micro Devices, Inc.Inventor: Daniel W. WONG
-
Patent number: 9621143Abstract: Techniques are disclosed relating to detecting and minimizing timing problems created by clock domain crossing (CDC) in integrated circuits. In various embodiments, one or more timing parameters are associated with a path that crosses between clock domains in an integrated circuit, where the one or more timing parameters specify a propagation delay for the path. In one embodiment, the timing parameters may be distributed to different design stages using a configuration file. In some embodiments, the one or more parameters may be used in conjunction with an RTL model to simulate propagation of a data signal along the path. In some embodiments, the one or more parameters may be used in conjunction with a netlist to create a physical design for the integrated circuit, where the physical design includes a representation of the path that has the specified propagation delay.Type: GrantFiled: November 8, 2013Date of Patent: April 11, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Osborn, Michael J. Tresidder, Aaron J. Grenat, Joseph Kidd, Priyank Parakh, Steven J. Kommrusch
-
Patent number: 9619428Abstract: A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.Type: GrantFiled: June 1, 2009Date of Patent: April 11, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Michael J. Mantor, Brian Emberling
-
Patent number: 9619290Abstract: A method of balancing execution rates for a plurality of parallel program loops being executed concurrently by a processor may include estimating a completion time for each program loop of the plurality of program loops, determining a difference between the estimated completion time of a first program loop of the plurality of program loops and the estimated completion time of a second program loop of the plurality of program loops, and decreasing the difference by adjusting an execution rate of the first program loop.Type: GrantFiled: March 6, 2015Date of Patent: April 11, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Peter Bailey, Indrani Paul, Manish Arora
-
Patent number: 9612884Abstract: Methods are provided for creating objects in a way that permits an API client to explicitly participate in memory management for an object created using the API. Methods for managing data object memory include requesting memory requirements for an object using an API and expressly allocating a memory location for the object based on the memory requirements. Methods are also provided for cloning objects such that a state of the object remains unchanged from the original object to the cloned object or can be explicitly specified.Type: GrantFiled: December 4, 2014Date of Patent: April 4, 2017Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Guennadi Riguer, Brian K. Bennett
-
Patent number: 9606936Abstract: Methods, systems, and computer readable media generalize control registers in the context of memory address translations for I/O devices. A method includes maintaining a table including a plurality of concurrently available control register base pointers each associated with a corresponding input/output (I/O) device, associating each control register base pointer with a first translation from a guest virtual address (GVA) to a guest physical address (GPA) and a second translation from the GPA to a system physical address (SPA), and operating the first and second translations concurrently for the plurality of I/O devices.Type: GrantFiled: December 2, 2011Date of Patent: March 28, 2017Assignees: Advanced Micro Devices, Inc., ATI Technologies ULCInventors: Andy Kegel, Mark Hummel, Tony Asaro, Philip Ng
-
Patent number: 9606806Abstract: A method includes selecting for execution in a processor a load instruction having at least one dependent instruction. Responsive to selecting the load instruction, the at least one dependent instruction is selectively awakened based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution. A processor includes an instruction pipeline having an execution unit to execute instructions, a scheduler, and a controller. The scheduler selects for execution in the execution unit a load instruction having at least one dependent instruction. The controller, responsive to the scheduler selecting the load instruction, selectively awakens the at least one dependent instruction based on a status of a store instruction associated with the load instruction to indicate that the at least one dependent instruction is eligible for execution by the execution unit.Type: GrantFiled: June 25, 2013Date of Patent: March 28, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Gregory W. Smaus, Michael Achenbach, Christopher J. Burke, Francesco Spadini
-
Patent number: 9606177Abstract: In one form, a scan flip-flop includes a clock gating cell and a dedicated clock flip-flop. The clock gating cell provides an input clock input signal as a scan clock signal when a scan shift enable signal is active, and provides the input clock signal as a data clock signal when the scan shift enable signal is inactive. The dedicated clock flip-flop stores a data input signal and provides the data input signal, so stored, as a data output signal in response to transitions of the data clock signal, and stores a scan data input signal and provides the scan data input signal, so stored, as the data output signal in response to transitions of the scan clock signal. The clock gating cell may further provide the input clock signal as the data clock signal when both a scan shift enable signal is inactive and a data enable signal is active.Type: GrantFiled: May 19, 2015Date of Patent: March 28, 2017Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Daniel W. Bailey, Abhishek Sharma, Michael Q. Co