Patents by Inventor Joseph L. Greathouse

Joseph L. Greathouse has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10725670
    Abstract: A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.
    Type: Grant
    Filed: August 1, 2018
    Date of Patent: July 28, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Jagadish B. Kotra, Karthik Rao, Joseph L. Greathouse
  • Patent number: 10713059
    Abstract: A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.
    Type: Grant
    Filed: September 18, 2014
    Date of Patent: July 14, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Joseph L. Greathouse, Mitesh R. Meswani, Sooraj Puthoor, Dmitri Yudanov, James M. O'Connor
  • Publication number: 20200201404
    Abstract: A plurality of thermal electric cooler (TEC) elements are formed in a TEC grid structure. Control logic dynamically varies a supply current supplied to each TEC element (or group of TEC elements) in the TEC grid based on changes in power density respectively associated with areas cooled by each of the TEC elements or group of TEC elements.
    Type: Application
    Filed: December 20, 2018
    Publication date: June 25, 2020
    Inventors: Karthik Rao, Wei Huang, Xudong An, Manish Arora, Joseph L. Greathouse
  • Publication number: 20200183485
    Abstract: A processing system dynamically scales at least one of voltage and frequency at a subset of a plurality of compute units of a graphics processing unit (GPU) based on characteristics of a kernel or workload to be executed at the subset. A system management unit for the processing system receives a compute unit mask, designating the subset of a plurality of compute units of a GPU to execute the kernel or workload, and workload characteristics indicating the compute-boundedness or memory bandwidth-boundedness of the kernel or workload from a central processing unit of the processing system. The system management unit determines a dynamic voltage and frequency scaling policy for the subset of the plurality of compute units of the GPU based on the compute unit mask and the workload characteristics.
    Type: Application
    Filed: December 7, 2018
    Publication date: June 11, 2020
    Inventors: Shomit N. DAS, Joseph L. GREATHOUSE
  • Publication number: 20200042197
    Abstract: A system including a stack of two or more layers of volatile memory, such as layers of a 3D stacked DRAM memory, places data in the stack based on a temperature or a refresh rate. When a threshold is exceeded, data are moved from a first region to a second region in the stack, the second region having one or both of a second temperature lower than a first temperature of the first region or a second refresh rate lower than a first refresh rate of the first region.
    Type: Application
    Filed: August 1, 2018
    Publication date: February 6, 2020
    Inventors: Jagadish B. KOTRA, Karthik RAO, Joseph L. GREATHOUSE
  • Patent number: 10552339
    Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.
    Type: Grant
    Filed: June 12, 2018
    Date of Patent: February 4, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Joseph L. Greathouse
  • Publication number: 20200034405
    Abstract: A method includes storing a first portion of a sparse triangular matrix in a local memory and launching a kernel for executing a set of workgroups. The first portion includes a plurality of row blocks, and each workgroup in the set of workgroups is associated with one of the plurality of row blocks. The method also includes, for each workgroup in the set of workgroups, solving the row block. The row block is solved by, for each row segment of a first subset of row segments in the row block, calculating a partial sum for the row segment based on one or more matrix elements in the row segment, and writing the partial sum to a remote memory of a first remote processing unit prior to terminating the kernel.
    Type: Application
    Filed: July 24, 2018
    Publication date: January 30, 2020
    Inventors: Khaled Hamidouche, Michael W. LeBeane, Nicholas P. Malaya, Joseph L. Greathouse
  • Publication number: 20190377688
    Abstract: An operating system (OS) of a processing system having a plurality of processor cores determines a cost associated with different mechanisms for performing a translation lookaside buffer (TLB) shootdown in response to, for example, a virtual address being remapped to a new physical address, and selects a TLB shootdown mechanism to purge outdated or invalid address translations from the TLB based on the determined cost. In some embodiments, the OS selects an inter-processor interrupt (IPI) as the TLB shootdown mechanism if the cost associated with sending an IPI is less than a threshold cost. In some embodiments, the OS compares the cost of using an IPI as the TLB shootdown mechanism versus the cost of sending a hardware broadcast to all processor cores of the processing system as the shootdown mechanism and selects the shootdown mechanism having the lower cost.
    Type: Application
    Filed: June 12, 2018
    Publication date: December 12, 2019
    Inventors: Arkaprava BASU, Joseph L. GREATHOUSE
  • Publication number: 20190286209
    Abstract: A processor utilizes instruction based sampling to generate sampling data sampled on a per instruction basis during execution of an instruction. The sampling data indicates what processor hardware was used due to the execution of the instruction. Software receives the sampling data and generates an estimate of energy used by the instruction based on the sampling data. The sampling data may include microarchitectural events and the energy estimate utilizes a base energy amount corresponding to the instruction executed along with energy amounts corresponding to the microarchitectural events in the sampling data. The sampling data may include switching events associated with hardware blocks that switched due to execution of the instruction and the energy estimate for the instruction is based on the switching events and capacitance estimates associated with the hardware blocks.
    Type: Application
    Filed: March 16, 2018
    Publication date: September 19, 2019
    Inventors: Shijia Wei, Joseph L. Greathouse, John Kalamatianos
  • Patent number: 10067710
    Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: September 4, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Joseph L. Greathouse, Christopher D. Erb, Michael G. Collins
  • Patent number: 10019283
    Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.
    Type: Grant
    Filed: June 22, 2015
    Date of Patent: July 10, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
  • Patent number: 9990203
    Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: June 5, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Leonardo de Paula Rosa Piga, Abhinandan Majumdar, Indrani Paul, Wei Huang, Manish Arora, Joseph L. Greathouse
  • Publication number: 20180143781
    Abstract: A processing apparatus is provided that includes a plurality of memory regions each corresponding to a memory address and configured to store data associated with the corresponding memory address. The processing apparatus also includes an accelerated processing device in communication with the memory regions and configured to determine a request to allocate an initial memory buffer comprising a number of contiguous memory regions, create a new memory buffer comprising one or more additional memory regions adjacent to the contiguous memory regions of the initial memory buffer, assign one or more values to the one or more additional memory regions and detect a change to the one or more values at the one or more additional memory regions.
    Type: Application
    Filed: November 23, 2016
    Publication date: May 24, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Joseph L. Greathouse, Christopher D. Erb, Michael G. Collins
  • Publication number: 20180069767
    Abstract: Techniques described herein improve processor performance in situations where a large number of system service requests are being received from other devices. More specifically, upon detecting that certain operating conditions that indicate a processor slowdown are present, the processor performs one or more system service adjustment techniques. These techniques include throttling (reducing the rate of handling) of such requests, coalescing (grouping multiple requests into a single group) the requests, disabling microarchitctural structures (such as caches or branch prediction units) or updates to those structures, and prefetching data for or pre-performing these requests. Each of these adjustment techniques helps to reduce the number of and/or workload associated with servicing requests for system services.
    Type: Application
    Filed: September 6, 2016
    Publication date: March 8, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Joseph L. Greathouse, Guru Prasadh V. Venkataramani, Jan Vesely
  • Patent number: 9697176
    Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: July 4, 2017
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Mayank Daga, Joseph L. Greathouse
  • Publication number: 20170185409
    Abstract: Methods, devices, and systems for capturing an accuracy of an instruction executing on a processor. An instruction may be executed on the processor, and the accuracy of the instruction may be captured using a hardware counter circuit. The accuracy of the instruction may be captured by analyzing bits of at least one value of the instruction to determine a minimum or maximum precision datatype for representing the field, and determining whether to adjust a value of the hardware counter circuit accordingly. The representation may be output to a debugger or logfile for use by a developer, or may be output to a runtime or virtual machine to automatically adjust instruction precision or gating of portions of the processor datapath.
    Type: Application
    Filed: December 28, 2015
    Publication date: June 29, 2017
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Leonardo de Paula Rosa Piga, Abhinandan Majumdar, Indrani Paul, Wei Huang, Manish Arora, Joseph L. Greathouse
  • Publication number: 20160371082
    Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.
    Type: Application
    Filed: June 22, 2015
    Publication date: December 22, 2016
    Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
  • Patent number: 9483379
    Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value.
    Type: Grant
    Filed: October 15, 2013
    Date of Patent: November 1, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Joseph L. Greathouse, David S. Christie
  • Patent number: 9448909
    Abstract: A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.
    Type: Grant
    Filed: October 15, 2013
    Date of Patent: September 20, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Joseph L. Greathouse, David S. Christie
  • Publication number: 20160140084
    Abstract: A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.
    Type: Application
    Filed: November 14, 2014
    Publication date: May 19, 2016
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Mayank Daga, Joseph L. Greathouse