Patents by Inventor Aaftab A. Munshi
Aaftab A. Munshi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 8723877Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.Type: GrantFiled: September 28, 2010Date of Patent: May 13, 2014Assignee: Apple Inc.Inventors: Aaftab A. Munshi, Ian R. Ollmann
-
Publication number: 20130081066Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.Type: ApplicationFiled: October 5, 2012Publication date: March 28, 2013Inventors: Aaftab A. Munshi, Nathaniel Begeman
-
Publication number: 20130063451Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.Type: ApplicationFiled: September 13, 2012Publication date: March 14, 2013Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20130055272Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.Type: ApplicationFiled: August 28, 2012Publication date: February 28, 2013Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20130007774Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: ApplicationFiled: September 13, 2012Publication date: January 3, 2013Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 8341611Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The system includes a host processor, a graphics processing unit (GPU) coupled to the host processor and a memory coupled to at least one of the host processor and the GPU. The parallel computing program is stored in the memory to allocate threads between the host processor and the GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g. GPUs or CPUs, separate from the host processor.Type: GrantFiled: May 3, 2007Date of Patent: December 25, 2012Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20120320071Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.Type: ApplicationFiled: June 27, 2012Publication date: December 20, 2012Inventors: Aaftab A. Munshi, Nathaniel Begeman
-
Patent number: 8286198Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.Type: GrantFiled: November 4, 2008Date of Patent: October 9, 2012Assignee: Apple Inc.Inventors: Aaftab A. Munshi, Nathaniel Begeman
-
Patent number: 8286196Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.Type: GrantFiled: May 3, 2007Date of Patent: October 9, 2012Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 8276164Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: GrantFiled: May 3, 2007Date of Patent: September 25, 2012Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 8225325Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.Type: GrantFiled: November 4, 2008Date of Patent: July 17, 2012Assignee: Apple Inc.Inventors: Aaftab A. Munshi, Nathaniel Begeman
-
Patent number: 8108633Abstract: A method and an apparatus that allocate a stream memory and/or a local memory for a variable in an executable loaded from a host processor to the compute processor according to whether a compute processor supports a storage capability are described. The compute processor may be a graphics processing unit (GPU) or a central processing unit (CPU). Alternatively, an application running in a host processor configures storage capabilities in a compute processor, such as CPU or GPU, to determine a memory location for accessing a variable in an executable executed by a plurality of threads in the compute processor. The configuration and allocation are based on API calls in the host processor.Type: GrantFiled: May 3, 2007Date of Patent: January 31, 2012Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20110285729Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.Type: ApplicationFiled: September 28, 2010Publication date: November 24, 2011Inventors: Aaftab A. Munshi, Ian R. Ollmann
-
Patent number: 7791602Abstract: The present invention relates to computer graphics applications involving scene rendering using objects modeled at multiple levels of detail. In accordance with an aspect of the invention, a ray tracer implementation allows users to specify multiple versions of a particular object, categorized by LOD ID's. A scene server selects the version appropriate for the particular scene, based on the size of the object on the screen for example, and provides a smooth transition between multiple versions of an object model. In one example, the scene server will select two LOD representations associated with a given object and assign relative weights to each representation. The LOD weights are specified to indicate how to blend these representations together.Type: GrantFiled: August 14, 2006Date of Patent: September 7, 2010Inventors: Aaftab A. Munshi, Mark Wood-Patrick
-
Patent number: 7791612Abstract: A graphics processing system including a cache memory circuit coupled to the graphics processor and the address and data busses for storing graphics data according to a respective address. The cache memory includes first and second memories coupled together by a plurality of activation lines. The first memory has a corresponding plurality of address detection units to store addresses and provide activation signals in response to receiving a matching address. The second memory includes a corresponding plurality of data storage locations. Each data storage location is coupled to a respective one of the plurality of address storage locations by a respective activation line to provide graphics data in response to receiving an activation signal from the respective address storage location.Type: GrantFiled: August 31, 2004Date of Patent: September 7, 2010Assignee: Micron Technology, Inc.Inventor: Aaftab Munshi
-
Patent number: 7689717Abstract: A method, system, computer program product, and protocol for digital rendering over a network is described. Rendering resources associated with a project are stored in a project resource pool at a rendering service site, and for each rendering request received from a client site the project resource pool is compared to current rendering resources at the client site. A given rendering resource is uploaded from the client site to the rendering service only if the project resource pool does not contain the current version, thereby conserving bandwidth. In accordance with a preferred embodiment, redundant generation of raw rendering resource files is avoided by only generating those raw rendering resource files not mated with generated rendering resource files. Methods for reducing redundant generation of raw resources are also described, as well as methods for statistically reducing the number of raw resource files required to be uploaded to the rendering service for multi-frame sessions.Type: GrantFiled: August 31, 2006Date of Patent: March 30, 2010Inventors: Aaftab A. Munshi, Avi I. Bleiweiss
-
Publication number: 20090307704Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.Type: ApplicationFiled: November 4, 2008Publication date: December 10, 2009Inventors: AAFTAB A. MUNSHI, Nathaniel Begeman
-
Publication number: 20090307699Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.Type: ApplicationFiled: November 4, 2008Publication date: December 10, 2009Inventors: AAFTAB A. MUNSHI, Nathaniel Begeman
-
Patent number: 7606429Abstract: A block-based image compression method and encoder/decoder circuit compresses a plurality of pixels having corresponding original color values and luminance values in a block according to different modes of operation. The encoding circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels derived from the corresponding luminance levels to produce at least a block color offset value and a quantization value. According to mode zero, each of the pixels in the block is associated with one of the plurality of generated representative color values to generate error map values and a mode zero color error value. According to mode one, representative color values for each of at least three luminance levels are also generated to produce at least three representative color values, corresponding bitmap values and a mode one color error value.Type: GrantFiled: March 25, 2005Date of Patent: October 20, 2009Assignee: ATI Technologies ULCInventors: Milivoje Aleksic, Aaftab Munshi, Charles D. Ogden
-
Patent number: 7505624Abstract: A block-based image compression method and encoder/decoder circuit compress a plurality of pixels in a block where each pixel includes a corresponding color value and a corresponding luminance value. The encoder circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels to produce at least a high color value and a low color value. In response to generating the representative color values, the luminance-level-based representative color generator associates each of the pixels in the block with one of the plurality of representative color values to produce corresponding bitmap values. The encoder circuit further includes a color type block generator to perform at least one of: (a) generate block color data indicating a regular/alternate color block type and (b) representing a block color type by ordering the representative color values that are to be sent to a decoder.Type: GrantFiled: May 27, 2005Date of Patent: March 17, 2009Assignee: ATI Technologies ULCInventors: Charles D. Ogden, Aaftab Munshi