Patents by Inventor Aaftab A. Munshi

Aaftab A. Munshi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8723877
    Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
    Type: Grant
    Filed: September 28, 2010
    Date of Patent: May 13, 2014
    Assignee: Apple Inc.
    Inventors: Aaftab A. Munshi, Ian R. Ollmann
  • Publication number: 20130081066
    Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.
    Type: Application
    Filed: October 5, 2012
    Publication date: March 28, 2013
    Inventors: Aaftab A. Munshi, Nathaniel Begeman
  • Publication number: 20130063451
    Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.
    Type: Application
    Filed: September 13, 2012
    Publication date: March 14, 2013
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Publication number: 20130055272
    Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.
    Type: Application
    Filed: August 28, 2012
    Publication date: February 28, 2013
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Publication number: 20130007774
    Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 3, 2013
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Patent number: 8341611
    Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The system includes a host processor, a graphics processing unit (GPU) coupled to the host processor and a memory coupled to at least one of the host processor and the GPU. The parallel computing program is stored in the memory to allocate threads between the host processor and the GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g. GPUs or CPUs, separate from the host processor.
    Type: Grant
    Filed: May 3, 2007
    Date of Patent: December 25, 2012
    Assignee: Apple Inc.
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Publication number: 20120320071
    Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
    Type: Application
    Filed: June 27, 2012
    Publication date: December 20, 2012
    Inventors: Aaftab A. Munshi, Nathaniel Begeman
  • Patent number: 8286198
    Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: October 9, 2012
    Assignee: Apple Inc.
    Inventors: Aaftab A. Munshi, Nathaniel Begeman
  • Patent number: 8286196
    Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.
    Type: Grant
    Filed: May 3, 2007
    Date of Patent: October 9, 2012
    Assignee: Apple Inc.
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Patent number: 8276164
    Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.
    Type: Grant
    Filed: May 3, 2007
    Date of Patent: September 25, 2012
    Assignee: Apple Inc.
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Patent number: 8225325
    Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: July 17, 2012
    Assignee: Apple Inc.
    Inventors: Aaftab A. Munshi, Nathaniel Begeman
  • Patent number: 8108633
    Abstract: A method and an apparatus that allocate a stream memory and/or a local memory for a variable in an executable loaded from a host processor to the compute processor according to whether a compute processor supports a storage capability are described. The compute processor may be a graphics processing unit (GPU) or a central processing unit (CPU). Alternatively, an application running in a host processor configures storage capabilities in a compute processor, such as CPU or GPU, to determine a memory location for accessing a variable in an executable executed by a plurality of threads in the compute processor. The configuration and allocation are based on API calls in the host processor.
    Type: Grant
    Filed: May 3, 2007
    Date of Patent: January 31, 2012
    Assignee: Apple Inc.
    Inventors: Aaftab Munshi, Jeremy Sandmel
  • Publication number: 20110285729
    Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.
    Type: Application
    Filed: September 28, 2010
    Publication date: November 24, 2011
    Inventors: Aaftab A. Munshi, Ian R. Ollmann
  • Patent number: 7791602
    Abstract: The present invention relates to computer graphics applications involving scene rendering using objects modeled at multiple levels of detail. In accordance with an aspect of the invention, a ray tracer implementation allows users to specify multiple versions of a particular object, categorized by LOD ID's. A scene server selects the version appropriate for the particular scene, based on the size of the object on the screen for example, and provides a smooth transition between multiple versions of an object model. In one example, the scene server will select two LOD representations associated with a given object and assign relative weights to each representation. The LOD weights are specified to indicate how to blend these representations together.
    Type: Grant
    Filed: August 14, 2006
    Date of Patent: September 7, 2010
    Inventors: Aaftab A. Munshi, Mark Wood-Patrick
  • Patent number: 7791612
    Abstract: A graphics processing system including a cache memory circuit coupled to the graphics processor and the address and data busses for storing graphics data according to a respective address. The cache memory includes first and second memories coupled together by a plurality of activation lines. The first memory has a corresponding plurality of address detection units to store addresses and provide activation signals in response to receiving a matching address. The second memory includes a corresponding plurality of data storage locations. Each data storage location is coupled to a respective one of the plurality of address storage locations by a respective activation line to provide graphics data in response to receiving an activation signal from the respective address storage location.
    Type: Grant
    Filed: August 31, 2004
    Date of Patent: September 7, 2010
    Assignee: Micron Technology, Inc.
    Inventor: Aaftab Munshi
  • Patent number: 7689717
    Abstract: A method, system, computer program product, and protocol for digital rendering over a network is described. Rendering resources associated with a project are stored in a project resource pool at a rendering service site, and for each rendering request received from a client site the project resource pool is compared to current rendering resources at the client site. A given rendering resource is uploaded from the client site to the rendering service only if the project resource pool does not contain the current version, thereby conserving bandwidth. In accordance with a preferred embodiment, redundant generation of raw rendering resource files is avoided by only generating those raw rendering resource files not mated with generated rendering resource files. Methods for reducing redundant generation of raw resources are also described, as well as methods for statistically reducing the number of raw resource files required to be uploaded to the rendering service for multi-frame sessions.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: March 30, 2010
    Inventors: Aaftab A. Munshi, Avi I. Bleiweiss
  • Publication number: 20090307704
    Abstract: A method and an apparatus that determine a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to a multi-dimensional global thread number included in the API request. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.
    Type: Application
    Filed: November 4, 2008
    Publication date: December 10, 2009
    Inventors: AAFTAB A. MUNSHI, Nathaniel Begeman
  • Publication number: 20090307699
    Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.
    Type: Application
    Filed: November 4, 2008
    Publication date: December 10, 2009
    Inventors: AAFTAB A. MUNSHI, Nathaniel Begeman
  • Patent number: 7606429
    Abstract: A block-based image compression method and encoder/decoder circuit compresses a plurality of pixels having corresponding original color values and luminance values in a block according to different modes of operation. The encoding circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels derived from the corresponding luminance levels to produce at least a block color offset value and a quantization value. According to mode zero, each of the pixels in the block is associated with one of the plurality of generated representative color values to generate error map values and a mode zero color error value. According to mode one, representative color values for each of at least three luminance levels are also generated to produce at least three representative color values, corresponding bitmap values and a mode one color error value.
    Type: Grant
    Filed: March 25, 2005
    Date of Patent: October 20, 2009
    Assignee: ATI Technologies ULC
    Inventors: Milivoje Aleksic, Aaftab Munshi, Charles D. Ogden
  • Patent number: 7505624
    Abstract: A block-based image compression method and encoder/decoder circuit compress a plurality of pixels in a block where each pixel includes a corresponding color value and a corresponding luminance value. The encoder circuit includes a luminance-level-based representative color generator to generate representative color values for each of a plurality of luminance levels to produce at least a high color value and a low color value. In response to generating the representative color values, the luminance-level-based representative color generator associates each of the pixels in the block with one of the plurality of representative color values to produce corresponding bitmap values. The encoder circuit further includes a color type block generator to perform at least one of: (a) generate block color data indicating a regular/alternate color block type and (b) representing a block color type by ordering the representative color values that are to be sent to a decoder.
    Type: Grant
    Filed: May 27, 2005
    Date of Patent: March 17, 2009
    Assignee: ATI Technologies ULC
    Inventors: Charles D. Ogden, Aaftab Munshi