Patents by Inventor Brian Emberling

Brian Emberling has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9015720
    Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.
    Type: Grant
    Filed: January 6, 2009
    Date of Patent: April 21, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Andrew Brown, Brian Emberling
  • Patent number: 8832712
    Abstract: A method of processing threads is provided. The method includes receiving a first thread that accesses a memory resource in a current state, holding the first thread, and releasing the first thread based responsive to a final thread that accesses the memory resource in the current state has been received.
    Type: Grant
    Filed: July 29, 2010
    Date of Patent: September 9, 2014
    Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Michael Houston, Stanislaw Skowronek, Elaine Poon, Brian Emberling
  • Publication number: 20120013627
    Abstract: Systems and methods to improve performance in a graphics processing unit are described herein. Embodiments achieve power saving in a graphics processing unit by dynamically activating/deactivating individual SIMDs in a shader complex that comprises multiple SIMD units. On-the-fly dynamic disabling and enabling of individual SIMDs provides flexibility in achieving a required performance and power level for a given processing application. Embodiments of the invention also achieve dynamic medium grain clock gating of SIMDs in a shader complex. Embodiments reduce switching power by shutting down clock trees to unused logic by providing a clock on demand mechanism. In this way, embodiments enhance clock gating to save more switching power for the duration of time when SIMDs are idle (or assigned no work). Embodiments can also save leakage power by power gating SIMDs for a duration when SIMDs are idle for an extended period of time.
    Type: Application
    Filed: July 12, 2011
    Publication date: January 19, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Tushar K. Shah, Michael J. Mantor, Brian Emberling
  • Publication number: 20110173629
    Abstract: A method of processing threads is provided. The method includes receiving a first thread that accesses a memory resource in a current state, holding the first thread, and releasing the first thread based responsive to a final thread that accesses the memory resource in the current state has been received.
    Type: Application
    Filed: July 29, 2010
    Publication date: July 14, 2011
    Inventors: Michael HOUSTON, Stanislaw Skowronek, Elaine Poon, Brian Emberling
  • Publication number: 20110055511
    Abstract: A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts.
    Type: Application
    Filed: September 3, 2009
    Publication date: March 3, 2011
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael MANTOR, John MCCARDLE, Marcos ZINI, Brian EMBERLING
  • Publication number: 20090300621
    Abstract: A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.
    Type: Application
    Filed: June 1, 2009
    Publication date: December 3, 2009
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael J. MANTOR, Brian Emberling
  • Publication number: 20090276777
    Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.
    Type: Application
    Filed: January 6, 2009
    Publication date: November 5, 2009
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Andrew Brown, Brian Emberling
  • Publication number: 20090037918
    Abstract: Execution of the first thread of a new program is prioritized ahead of older threads for a previously running program. The new program is invoked during the execution of a thread of the previous program. The first thread of the program is prioritized ahead of the remaining threads of the previous program. In an embodiment of the invention, additional threads of the new program are also prioritized ahead of the older threads. A thread's context may include a table of constant values that can be referenced by each program and are shared by multiple threads. Changing the values in a constant table for a new thread is time intensive. To avoid changes to the constant table (and thereby save time), a higher priority status is conferred to the first thread that follows a change to the constant table.
    Type: Application
    Filed: July 31, 2007
    Publication date: February 5, 2009
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Andrew Brown, Brian Emberling
  • Publication number: 20050052449
    Abstract: A non-blocking cache for texture mapping is implemented by separating Cache Tags from Cache Data. Multiple requests for data may be processed in parallel without strict ordering or synchronization. Separating Cache Tags and Cache Data results in a texture memory cache design that preempts stalling which would otherwise occur in case of cache-misses. Multiple Cache Tags with corresponding respective system memory controllers and Data Cache units allow for simultaneous processing of multiple requests without strict ordering. In preferred embodiments the texture memory cache may also be configured to predict cache misses and merge with burst reads from memory, and may equally be configured to minimize memory read-requests necessary during multitexturing, thus maximizing bandwidth.
    Type: Application
    Filed: March 31, 2003
    Publication date: March 10, 2005
    Inventor: Brian Emberling
  • Patent number: 6246422
    Abstract: A method for storing mip map series in a multi-bank texture memory is disclosed. Each mip map has a different size and represents a different resolution version of a texture map image that is to be mapped onto a three dimensional object comprising one or more polygons. To prevent page faults when accessing corresponding texels in consecutive mip maps, each mip map is divided in two halves. The halves are stored in different banks of the multi-bank texture memory. The banks used are alternated so that corresponding texels in consecutive mip maps are stored in different memory banks. Mip maps may be categorized as large or small, with all small mip maps after the first being stored in their entirety in one memory bank. Small mip maps are those that are equal to or smaller than the page size of the multi-bank texture memory. A computer system, graphics subsystem, and software program capable to efficiently store mip map series in a multi-bank texture memories are also disclosed.
    Type: Grant
    Filed: September 1, 1998
    Date of Patent: June 12, 2001
    Assignee: Sun Microsystems, Inc.
    Inventors: Brian Emberling, Michael G. Lavelle