Patents by Inventor Joseph L. Greathouse

Joseph L. Greathouse has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and apparatus for temperature-gradient aware data-placement for 3D stacked DRAMs

Patent number: 10725670

Abstract: A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.

Type: Grant

Filed: August 1, 2018

Date of Patent: July 28, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Jagadish B. Kotra, Karthik Rao, Joseph L. Greathouse
Heterogeneous graphics processing unit for scheduling thread groups for execution on variable width SIMD units

Patent number: 10713059

Abstract: A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.

Type: Grant

Filed: September 18, 2014

Date of Patent: July 14, 2020

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Joseph L. Greathouse, Mitesh R. Meswani, Sooraj Puthoor, Dmitri Yudanov, James M. O'Connor
RUNTIME LOCALIZED COOLING OF HIGH-PERFORMANCE PROCESSORS

Publication number: 20200201404

Abstract: A plurality of thermal electric cooler (TEC) elements are formed in a TEC grid structure. Control logic dynamically varies a supply current supplied to each TEC element (or group of TEC elements) in the TEC grid based on changes in power density respectively associated with areas cooled by each of the TEC elements or group of TEC elements.

Type: Application

Filed: December 20, 2018

Publication date: June 25, 2020

Inventors: Karthik Rao, Wei Huang, Xudong An, Manish Arora, Joseph L. Greathouse
HINT-BASED FINE-GRAINED DYNAMIC VOLTAGE AND FREQUENCY SCALING IN GPUS

Publication number: 20200183485

Abstract: A processing system dynamically scales at least one of voltage and frequency at a subset of a plurality of compute units of a graphics processing unit (GPU) based on characteristics of a kernel or workload to be executed at the subset. A system management unit for the processing system receives a compute unit mask, designating the subset of a plurality of compute units of a GPU to execute the kernel or workload, and workload characteristics indicating the compute-boundedness or memory bandwidth-boundedness of the kernel or workload from a central processing unit of the processing system. The system management unit determines a dynamic voltage and frequency scaling policy for the subset of the plurality of compute units of the GPU based on the compute unit mask and the workload characteristics.

Type: Application

Filed: December 7, 2018

Publication date: June 11, 2020

Inventors: Shomit N. DAS, Joseph L. GREATHOUSE
METHOD AND APPARATUS FOR TEMPERATURE-GRADIENT AWARE DATA-PLACEMENT FOR 3D STACKED DRAMS

Publication number: 20200042197

Abstract: A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.

Type: Application

Filed: August 1, 2018

Publication date: February 6, 2020

Inventors: Jagadish B. KOTRA, Karthik RAO, Joseph L. GREATHOUSE
Dynamically adapting mechanism for translation lookaside buffer shootdowns

Patent number: 10552339

Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

Type: Grant

Filed: June 12, 2018

Date of Patent: February 4, 2020

Assignee: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Joseph L. Greathouse
OPTIMIZED AND SCALABLE SPARSE TRIANGULAR LINEAR SYSTEMS ON NETWORKS OF ACCELERATORS

Publication number: 20200034405

Abstract: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.

Type: Application

Filed: July 24, 2018

Publication date: January 30, 2020

Inventors: Khaled Hamidouche, Michael W. LeBeane, Nicholas P. Malaya, Joseph L. Greathouse
DYNAMICALLY ADAPTING MECHANISM FOR TRANSLATION LOOKASIDE BUFFER SHOOTDOWNS

Publication number: 20190377688

Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.

Type: Application

Filed: June 12, 2018

Publication date: December 12, 2019

Inventors: Arkaprava BASU, Joseph L. GREATHOUSE
PER-INSTRUCTION ENERGY DEBUGGING USING INSTRUCTION SAMPLING HARDWARE

Publication number: 20190286209

Abstract: A processor utilizes instruction based sampling to generate sampling data sampled on a per instruction basis during execution of an instruction. The sampling data indicates what processor hardware was used due to the execution of the instruction. Software receives the sampling data and generates an estimate of energy used by the instruction based on the sampling data. The sampling data may include microarchitectural events and the energy estimate utilizes a base energy amount corresponding to the instruction executed along with energy amounts corresponding to the microarchitectural events in the sampling data. The sampling data may include switching events associated with hardware blocks that switched due to execution of the instruction and the energy estimate for the instruction is based on the switching events and capacitance estimates associated with the hardware blocks.

Type: Application

Filed: March 16, 2018

Publication date: September 19, 2019

Inventors: Shijia Wei, Joseph L. Greathouse, John Kalamatianos
Detecting buffer overflows in general-purpose GPU applications

Patent number: 10067710

Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.

Type: Grant

Filed: November 23, 2016

Date of Patent: September 4, 2018

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Joseph L. Greathouse, Christopher D. Erb, Michael G. Collins
Predicting a context portion to move between a context buffer and registers based on context portions previously used by at least one other thread

Patent number: 10019283

Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.

Type: Grant

Filed: June 22, 2015

Date of Patent: July 10, 2018

Assignee: Advanced Micro Devices, Inc.

Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
Hardware accuracy counters for application precision and quality feedback

Patent number: 9990203

Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.

Type: Grant

Filed: December 28, 2015

Date of Patent: June 5, 2018

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Leonardo de Paula Rosa Piga, Abhinandan Majumdar, Indrani Paul, Wei Huang, Manish Arora, Joseph L. Greathouse
DETECTING BUFFER OVERFLOWS IN GENERAL-PURPOSE GPU APPLICATIONS

Publication number: 20180143781

Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.

Type: Application

Filed: November 23, 2016

Publication date: May 24, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Joseph L. Greathouse, Christopher D. Erb, Michael G. Collins
PRESERVING QUALITY OF SERVICE CONSTRAINTS IN HETEROGENEOUS PROCESSING SYSTEMS

Publication number: 20180069767

Abstract: Techniques described herein improve processor performance in situations where a large number of system service requests are being received from other devices. More specifically, upon detecting that certain operating conditions that indicate a processor slowdown are present, the processor performs one or more system service adjustment techniques. These techniques include throttling (reducing the rate of handling) of such requests, coalescing (grouping multiple requests into a single group) the requests, disabling microarchitctural structures (such as caches or branch prediction units) or updates to those structures, and prefetching data for or pre-performing these requests. Each of these adjustment techniques helps to reduce the number of and/or workload associated with servicing requests for system services.

Type: Application

Filed: September 6, 2016

Publication date: March 8, 2018

Applicant: Advanced Micro Devices, Inc.

Inventors: Arkaprava Basu, Joseph L. Greathouse, Guru Prasadh V. Venkataramani, Jan Vesely
Efficient sparse matrix-vector multiplication on parallel processors

Patent number: 9697176

Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.

Type: Grant

Filed: November 14, 2014

Date of Patent: July 4, 2017

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Mayank Daga, Joseph L. Greathouse
HARDWARE ACCURACY COUNTERS FOR APPLICATION PRECISION AND QUALITY FEEDBACK

Publication number: 20170185409

Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.

Type: Application

Filed: December 28, 2015

Publication date: June 29, 2017

Applicant: Advanced Micro Devices, Inc.

Inventors: Leonardo de Paula Rosa Piga, Abhinandan Majumdar, Indrani Paul, Wei Huang, Manish Arora, Joseph L. Greathouse
INSTRUCTION CONTEXT SWITCHING

Publication number: 20160371082

Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.

Type: Application

Filed: June 22, 2015

Publication date: December 22, 2016

Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
Randomly branching using hardware watchpoints

Patent number: 9483379

Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value.

Type: Grant

Filed: October 15, 2013

Date of Patent: November 1, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Joseph L. Greathouse, David S. Christie
Randomly branching using performance counters

Patent number: 9448909

Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.

Type: Grant

Filed: October 15, 2013

Date of Patent: September 20, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Joseph L. Greathouse, David S. Christie
EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON PARALLEL PROCESSORS

Publication number: 20160140084

Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.

Type: Application

Filed: November 14, 2014

Publication date: May 19, 2016

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Mayank Daga, Joseph L. Greathouse

prev 1 2 3 next