Patents by Inventor Brian Emberling

Brian Emberling has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient state transition among multiple programs on multi-threaded processors by executing cache priming program

Patent number: 9015720

Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.

Type: Grant

Filed: January 6, 2009

Date of Patent: April 21, 2015

Assignee: Advanced Micro Devices, Inc.

Inventors: Andrew Brown, Brian Emberling
System and method for synchronizing threads using shared memory having different buffer portions for local and remote cores in a multi-processor system

Patent number: 8832712

Abstract: A method of processing threads is provided. The method includes receiving a first thread that accesses a memory resource in a current state, holding the first thread, and releasing the first thread based responsive to a final thread that accesses the memory resource in the current state has been received.

Type: Grant

Filed: July 29, 2010

Date of Patent: September 9, 2014

Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.

Inventors: Michael Houston, Stanislaw Skowronek, Elaine Poon, Brian Emberling
DYNAMIC CONTROL OF SIMDs

Publication number: 20120013627

Abstract: Systems and methods to improve performance in a graphics processing unit are described herein. Embodiments achieve power saving in a graphics processing unit by dynamically activating/deactivating individual SIMDs in a shader complex that comprises multiple SIMD units. On-the-fly dynamic disabling and enabling of individual SIMDs provides flexibility in achieving a required performance and power level for a given processing application. Embodiments of the invention also achieve dynamic medium grain clock gating of SIMDs in a shader complex. Embodiments reduce switching power by shutting down clock trees to unused logic by providing a clock on demand mechanism. In this way, embodiments enhance clock gating to save more switching power for the duration of time when SIMDs are idle (or assigned no work). Embodiments can also save leakage power by power gating SIMDs for a duration when SIMDs are idle for an extended period of time.

Type: Application

Filed: July 12, 2011

Publication date: January 19, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Tushar K. Shah, Michael J. Mantor, Brian Emberling
Thread Synchronization

Publication number: 20110173629

Abstract: A method of processing threads is provided. The method includes receiving a first thread that accesses a memory resource in a current state, holding the first thread, and releasing the first thread based responsive to a final thread that accesses the memory resource in the current state has been received.

Type: Application

Filed: July 29, 2010

Publication date: July 14, 2011

Inventors: Michael HOUSTON, Stanislaw Skowronek, Elaine Poon, Brian Emberling
Interlocked Increment Memory Allocation and Access

Publication number: 20110055511

Abstract: A method of allocating a memory to a plurality of concurrent threads is presented. The method includes dynamically determining writer threads each having at least one pending write to the memory; and dynamically allocating respective contiguous blocks in the memory for each of the writer threads. Another method of allocating a memory to a plurality of concurrent threads includes launching the plurality of threads as a plurality of wavefronts, dynamically determining a group of wavefronts each having at least one thread requiring a write to the memory, and dynamically allocating respective contiguous blocks in the memory for each wavefront from the group of wavefronts.

Type: Application

Filed: September 3, 2009

Publication date: March 3, 2011

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, John MCCARDLE, Marcos ZINI, Brian EMBERLING
Local and Global Data Share

Publication number: 20090300621

Abstract: A graphics processing unit is disclosed, the graphics processing unit having a processor having one or more SIMD processing units, and a local data share corresponding to one of the one or more SIMD processing units, the local data share comprising one or more low latency accessible memory regions for each group of threads assigned to one or more execution wavefronts, and a global data share comprising one or more low latency memory regions for each group of threads.

Type: Application

Filed: June 1, 2009

Publication date: December 3, 2009

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael J. MANTOR, Brian Emberling
Multiple Programs for Efficient State Transitions on Multi-Threaded Processors

Publication number: 20090276777

Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.

Type: Application

Filed: January 6, 2009

Publication date: November 5, 2009

Applicant: Advanced Micro Devices, Inc.

Inventors: Andrew Brown, Brian Emberling
Thread sequencing for multi-threaded processor with instruction cache

Publication number: 20090037918

Abstract: Execution of the first thread of a new program is prioritized ahead of older threads for a previously running program. The new program is invoked during the execution of a thread of the previous program. The first thread of the program is prioritized ahead of the remaining threads of the previous program. In an embodiment of the invention, additional threads of the new program are also prioritized ahead of the older threads. A thread's context may include a table of constant values that can be referenced by each program and are shared by multiple threads. Changing the values in a constant table for a new thread is time intensive. To avoid changes to the constant table (and thereby save time), a higher priority status is conferred to the first thread that follows a change to the constant table.

Type: Application

Filed: July 31, 2007

Publication date: February 5, 2009

Applicant: Advanced Micro Devices, Inc.

Inventors: Andrew Brown, Brian Emberling
Novel design for a non-blocking cache for texture mapping

Publication number: 20050052449

Abstract: A non-blocking cache for texture mapping is implemented by separating Cache Tags from Cache Data. Multiple requests for data may be processed in parallel without strict ordering or synchronization. Separating Cache Tags and Cache Data results in a texture memory cache design that preempts stalling which would otherwise occur in case of cache-misses. Multiple Cache Tags with corresponding respective system memory controllers and Data Cache units allow for simultaneous processing of multiple requests without strict ordering. In preferred embodiments the texture memory cache may also be configured to predict cache misses and merge with burst reads from memory, and may equally be configured to minimize memory read-requests necessary during multitexturing, thus maximizing bandwidth.

Type: Application

Filed: March 31, 2003

Publication date: March 10, 2005

Inventor: Brian Emberling
Efficient method for storing texture maps in multi-bank memory

Patent number: 6246422

Abstract: A method for storing mip map series in a multi-bank texture memory is disclosed. Each mip map has a different size and represents a different resolution version of a texture map image that is to be mapped onto a three dimensional object comprising one or more polygons. To prevent page faults when accessing corresponding texels in consecutive mip maps, each mip map is divided in two halves. The halves are stored in different banks of the multi-bank texture memory. The banks used are alternated so that corresponding texels in consecutive mip maps are stored in different memory banks. Mip maps may be categorized as large or small, with all small mip maps after the first being stored in their entirety in one memory bank. Small mip maps are those that are equal to or smaller than the page size of the multi-bank texture memory. A computer system, graphics subsystem, and software program capable to efficiently store mip map series in a multi-bank texture memories are also disclosed.

Type: Grant

Filed: September 1, 1998

Date of Patent: June 12, 2001

Assignee: Sun Microsystems, Inc.

Inventors: Brian Emberling, Michael G. Lavelle

prev 1 2