Patents by Inventor Aaftab A. Munshi

Aaftab A. Munshi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Subbuffer objects

Patent number: 8723877

Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.

Type: Grant

Filed: September 28, 2010

Date of Patent: May 13, 2014

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Ian R. Ollmann
APPLICATION PROGRAMMING INTERFACES FOR DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20130081066

Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

Type: Application

Filed: October 5, 2012

Publication date: March 28, 2013

Inventors: Aaftab A. Munshi, Nathaniel Begeman
PARALLEL RUNTIME EXECUTION ON MULTIPLE PROCESSORS

Publication number: 20130063451

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

Type: Application

Filed: September 13, 2012

Publication date: March 14, 2013

Inventors: Aaftab Munshi, Jeremy Sandmel
PARALLEL RUNTIME EXECUTION ON MULTIPLE PROCESSORS

Publication number: 20130055272

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

Type: Application

Filed: August 28, 2012

Publication date: February 28, 2013

Inventors: Aaftab Munshi, Jeremy Sandmel
DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20130007774

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Application

Filed: September 13, 2012

Publication date: January 3, 2013

Inventors: Aaftab Munshi, Jeremy Sandmel
Application interface on multiple processors

Patent number: 8341611

Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The system includes a host processor, a graphics processing unit (GPU) coupled to the host processor and a memory coupled to at least one of the host processor and the GPU. The parallel computing program is stored in the memory to allocate threads between the host processor and the GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g. GPUs or CPUs, separate from the host processor.

Type: Grant

Filed: May 3, 2007

Date of Patent: December 25, 2012

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
MULTI-DIMENSIONAL THREAD GROUPING FOR MULTIPLE PROCESSORS

Publication number: 20120320071

Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.

Type: Application

Filed: June 27, 2012

Publication date: December 20, 2012

Inventors: Aaftab A. Munshi, Nathaniel Begeman
Application programming interfaces for data parallel computing on multiple processors

Patent number: 8286198

Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

Type: Grant

Filed: November 4, 2008

Date of Patent: October 9, 2012

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Nathaniel Begeman
Parallel runtime execution on multiple processors

Patent number: 8286196

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

Type: Grant

Filed: May 3, 2007

Date of Patent: October 9, 2012

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
Data parallel computing on multiple processors

Patent number: 8276164

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Grant

Filed: May 3, 2007

Date of Patent: September 25, 2012

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
Multi-dimensional thread grouping for multiple processors

Patent number: 8225325

Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.

Type: Grant

Filed: November 4, 2008

Date of Patent: July 17, 2012

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Nathaniel Begeman
Shared stream memory on multiple processors

Patent number: 8108633

Abstract: A method and an apparatus that allocate a stream memory and/or a local memory for a variable in an executable loaded from a host processor to the compute processor according to whether a compute processor supports a storage capability are described. The compute processor may be a graphics processing unit (GPU) or a central processing unit (CPU). Alternatively, an application running in a host processor configures storage capabilities in a compute processor, such as CPU or GPU, to determine a memory location for accessing a variable in an executable executed by a plurality of threads in the compute processor. The configuration and allocation are based on API calls in the host processor.

Type: Grant

Filed: May 3, 2007

Date of Patent: January 31, 2012

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
SUBBUFFER OBJECTS

Publication number: 20110285729

Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.

Type: Application

Filed: September 28, 2010

Publication date: November 24, 2011

Inventors: Aaftab A. Munshi, Ian R. Ollmann
Method and apparatus for implementing level of detail with ray tracing

Patent number: 7791602

Abstract: The present invention relates to computer graphics applications involving scene rendering using objects modeled at multiple levels of detail. In accordance with an aspect of the invention, a ray tracer implementation allows users to specify multiple versions of a particular object, categorized by LOD ID's. A scene server selects the version appropriate for the particular scene, based on the size of the object on the screen for example, and provides a smooth transition between multiple versions of an object model. In one example, the scene server will select two LOD representations associated with a given object and assign relative weights to each representation. The LOD weights are specified to indicate how to blend these representations together.

Type: Grant

Filed: August 14, 2006

Date of Patent: September 7, 2010

Inventors: Aaftab A. Munshi, Mark Wood-Patrick
Fully associative texture cache having content addressable memory and method for use thereof

Patent number: 7791612

Abstract: A graphics processing system including a cache memory circuit coupled to the graphics processor and the address and data busses for storing graphics data according to a respective address. The cache memory includes first and second memories coupled together by a plurality of activation lines. The first memory has a corresponding plurality of address detection units to store addresses and provide activation signals in response to receiving a matching address. The second memory includes a corresponding plurality of data storage locations. Each data storage location is coupled to a respective one of the plurality of address storage locations by a respective activation line to provide graphics data in response to receiving an activation signal from the respective address storage location.

Type: Grant

Filed: August 31, 2004

Date of Patent: September 7, 2010

Assignee: Micron Technology, Inc.

Inventor: Aaftab Munshi
Method and system for digital rendering over a network

Patent number: 7689717

Abstract: A method, system, computer program product, and protocol for digital rendering over a network is described. Rendering resources associated with a project are stored in a project resource pool at a rendering service site, and for each rendering request received from a client site the project resource pool is compared to current rendering resources at the client site. A given rendering resource is uploaded from the client site to the rendering service only if the project resource pool does not contain the current version, thereby conserving bandwidth. In accordance with a preferred embodiment, redundant generation of raw rendering resource files is avoided by only generating those raw rendering resource files not mated with generated rendering resource files. Methods for reducing redundant generation of raw resources are also described, as well as methods for statistically reducing the number of raw resource files required to be uploaded to the rendering service for multi-frame sessions.

Type: Grant

Filed: August 31, 2006

Date of Patent: March 30, 2010

Inventors: Aaftab A. Munshi, Avi I. Bleiweiss
MULTI-DIMENSIONAL THREAD GROUPING FOR MULTIPLE PROCESSORS

Publication number: 20090307704

Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.

Type: Application

Filed: November 4, 2008

Publication date: December 10, 2009

Inventors: AAFTAB A. MUNSHI, Nathaniel Begeman
APPLICATION PROGRAMMING INTERFACES FOR DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20090307699

Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

Type: Application

Filed: November 4, 2008

Publication date: December 10, 2009

Inventors: AAFTAB A. MUNSHI, Nathaniel Begeman
Block-based image compression method and apparatus

Patent number: 7606429

Abstract: A block-based image compression method and encoder/decoder circuit compresses a plurality of pixels having corresponding original color values and luminance values in a block according to different modes of operation. The encoding circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels derived from the corresponding luminance levels to produce at least a block color offset value and a quantization value. According to mode zero, each of the pixels in the block is associated with one of the plurality of generated representative color values to generate error map values and a mode zero color error value. According to mode one, representative color values for each of at least three luminance levels are also generated to produce at least three representative color values, corresponding bitmap values and a mode one color error value.

Type: Grant

Filed: March 25, 2005

Date of Patent: October 20, 2009

Assignee: ATI Technologies ULC

Inventors: Milivoje Aleksic, Aaftab Munshi, Charles D. Ogden
Block-based image compression method and apparatus

Patent number: 7505624

Abstract: A block-based image compression method and encoder/decoder circuit compress a plurality of pixels in a block where each pixel includes a corresponding color value and a corresponding luminance value. The encoder circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels to produce at least a high color value and a low color value. In response to generating the representative color values, the luminance-level-based representative color generator associates each of the pixels in the block with one of the plurality of representative color values to produce corresponding bitmap values. The encoder circuit further includes a color type block generator to perform at least one of: (a) generate block color data indicating a regular/alternate color block type and (b) representing a block color type by ordering the representative color values that are to be sent to a decoder.

Type: Grant

Filed: May 27, 2005

Date of Patent: March 17, 2009

Assignee: ATI Technologies ULC

Inventors: Charles D. Ogden, Aaftab Munshi

prev 1 2 3 4 5 6 next