Patents by Inventor Aaftab A. Munshi

Aaftab A. Munshi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Mid-render compute for graphics processing

Patent number: 10223822

Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.

Type: Grant

Filed: December 22, 2016

Date of Patent: March 5, 2019

Assignee: Apple Inc.

Inventors: Terence M. Potter, Ralph C. Taylor, Richard W. Schreyer, Aaftab A. Munshi, Justin A. Hensley
System and method for using ubershader variants without preprocessing macros

Patent number: 10180825

Abstract: Ubershaders may be used in a graphics development environment as an efficiency tool because many options and properties may be captured in a single shader program. Each selectable option of property in the shader code may be tagged with an attribute to indicate the presence of the selection. The single shader program embodying the many selectable options and properties may be compiled to an intermediate version that also embodies the many options and properties, along with at least remnants of the tagging attributes. Upon a request for executable code including indications of the desired selectable options or properties, generation of the executable code may proceed such that it includes only the desire selectable options and properties and not other selectable options and properties embodied in the source code.

Type: Grant

Filed: August 23, 2016

Date of Patent: January 15, 2019

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Charles Brissart, Owen Anderson, Mon Ping Wang, Ravi Ramaseshan
Application programming interfaces for data parallel computing on multiple processors

Patent number: 10067797

Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

Type: Grant

Filed: September 26, 2016

Date of Patent: September 4, 2018

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Nathaniel Begeman
DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20180203737

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Application

Filed: December 11, 2017

Publication date: July 19, 2018

Inventors: Aaftab Munshi, Jeremy Sandmel
Mid-Render Compute for Graphics Processing

Publication number: 20180182153

Abstract: Techniques are disclosed relating to performing mid-render auxiliary compute tasks for graphics processing. In some embodiments, auxiliary compute tasks are performed during a render pass, using at least a portion of a memory context of the render pass, without accessing a shared memory during the render pass. Relative to flushing render data to shared memory to perform compute tasks, this may reduce memory accesses and/or cache thrashing, which may in turn increase performance and/or reduce power consumption.

Type: Application

Filed: December 22, 2016

Publication date: June 28, 2018

Inventors: Terence M. Potter, Ralph C. Taylor, Richard W. Schreyer, Aaftab A. Munshi, Justin A. Hensley
Local Image Blocks for Graphics Processing

Publication number: 20180182058

Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.

Type: Application

Filed: December 22, 2016

Publication date: June 28, 2018

Inventors: Terence M. Potter, Robert Kenney, Aaftab A. Munshi, Justin A. Hensley, Richard W. Schreyer
Memory Consistency in Graphics Memory Hierarchy with Relaxed Ordering

Publication number: 20180181489

Abstract: Techniques are disclosed relating to memory consistency in a memory hierarchy with relaxed ordering. In some embodiments, an apparatus includes a first level cache that is shared by a plurality of shader processing elements and a second level cache that is shared by the shader processing elements and at least a texture processing unit. In some embodiments, the apparatus is configured to execute operations specified by graphics instructions that include (1) an attribute of the operation that specifies a type of memory consistency to be imposed for the operation and (2) scope information for the attribute that specifies whether the memory consistency specified by the attribute should be enforced at the first level cache or the second level cache. In some embodiments, the apparatus is configured to determine whether to sequence memory accesses at the first level cache and the second level cache based on the attribute and the scope.

Type: Application

Filed: December 22, 2016

Publication date: June 28, 2018

Inventors: Anthony P. DeLaurier, Owen C. Anderson, Michael J. Swift, Aaftab A. Munshi, Terence M. Potter
Data parallel computing on multiple processors

Patent number: 9858122

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs (Central Processing Units) or GPUs (Graphical Processing Units) attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Grant

Filed: August 12, 2016

Date of Patent: January 2, 2018

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
APPLICATION INTERFACE ON MULTIPLE PROCESSORS

Publication number: 20170371714

Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.

Type: Application

Filed: September 7, 2017

Publication date: December 28, 2017

Inventors: AAFTAB MUNSHI, JEREMY SANDMEL
System And Method For Tessellation In An Improved Graphics Pipeline

Publication number: 20170358132

Abstract: An improved tessellation graphics pipeline that obviates that use of early stage vertex shaders and hull shaders and allows greater efficiency and flexibility. Embodiments provide a graphics pipeline beginning with a tessellator that may obtain tessellation factors in any manner such as reading from a memory of factors provided by a developer or computing the factors using a compute kernel. In some embodiments, a single vertex shader may follow the tessellator and perform all the necessary vertex shading for the pipeline. Furthermore, in some embodiments, a compute kernel is used to generate the tessellation factors. The compute kernel provides flexibility that allows its employment for some graphic portions and not others. In addition, the streamlined pipeline facilitates the efficient use of scaling to determine tessellation factors for the same graphic portion at different camera distances or desired levels of replication of the mathematical model.

Type: Application

Filed: September 23, 2016

Publication date: December 14, 2017

Inventors: Aaftab A. Munshi, Michael B. Harris, Anna Tikhonova, Charles Brissart, Srinivas Dasari, Rahul Joshi, Kelvin C. Chiu, Mon Ping Wang, Nick W. Burns
Unified Intermediate Representation

Publication number: 20170308364

Abstract: A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system.

Type: Application

Filed: June 28, 2017

Publication date: October 26, 2017

Inventors: Aaftab Munshi, Rahul U. Joshi, Mon P. Wang, Kelvin C. Chiu
Application interface on multiple processors

Patent number: 9766938

Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The system includes a host processor, a graphics processing unit (GPU) coupled to the host processor and a memory coupled to at least one of the host processor and the GPU. The parallel computing program is stored in the memory to allocate threads between the host processor and the GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g. GPUs or CPUs, separate from the host processor.

Type: Grant

Filed: January 27, 2016

Date of Patent: September 19, 2017

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
Unified intermediate representation

Patent number: 9740464

Abstract: A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system.

Type: Grant

Filed: September 30, 2014

Date of Patent: August 22, 2017

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Rahul U. Joshi, Mon P. Wang, Kelvin C. Chiu
Multi-dimensional thread grouping for multiple processors

Patent number: 9720726

Abstract: A method and an apparatus that partition a total number of threads to concurrently execute executable codes compiled from a single source for target processing units in response to an API (Application Programming Interface) request from an application running in a host processing unit are described. The total number of threads is based on a multi-dimensional value for a global thread number specified in the API. The target processing units include GPUs (Graphics Processing Unit) and CPUs (Central Processing Unit). Thread group sizes for the target processing units are determined to partition the total number of threads according to either a dimension for a data parallel task associated with the executable codes or a dimension for a multi-dimensional value for a local thread group number. The executable codes are loaded to be executed in thread groups with the determined thread group sizes concurrently in the target processing units.

Type: Grant

Filed: June 27, 2012

Date of Patent: August 1, 2017

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Nathaniel Begeman
Subbuffer objects

Patent number: 9691346

Abstract: A method and an apparatus for a parallel computing program using subbuffers to perform a data processing task in parallel among heterogeneous compute units are described. The compute units can include a heterogeneous mix of central processing units (CPUs) and graphic processing units (GPUs). A system creates a subbuffer from a parent buffer for each of a plurality of heterogeneous compute units. If a subbuffer is not associated with the same compute unit as the parent buffer, the system copies data from the subbuffer to memory of that compute unit. The system further tracks updates to the data and transfers those updates back to the subbuffer.

Type: Grant

Filed: December 18, 2014

Date of Patent: June 27, 2017

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Ian R. Ollmann
System and Method for Using Ubershader Variants Without Preprocessing Macros

Publication number: 20170090886

Abstract: Ubershaders may be used in a graphics development environment as an efficiency tool because many options and properties may be captured in a single shader program. Each selectable option of property in the shader code may be tagged with an attribute to indicate the presence of the selection. The single shader program embodying the many selectable options and properties may be compiled to an intermediate version that also embodies the many options and properties, along with at least remnants of the tagging attributes. Upon a request for executable code including indications of the desired selectable options or properties, generation of the executable code may proceed such that it includes only the desire selectable options and properties and not other selectable options and properties embodied in the source code.

Type: Application

Filed: August 23, 2016

Publication date: March 30, 2017

Inventors: Aaftab A. Munshi, Charles Brissart, Owen Anderson, Mon Ping Wang, Ravi Ramaseshan
APPLICATION PROGRAMMING INTERFACES FOR DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20170075730

Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

Type: Application

Filed: September 26, 2016

Publication date: March 16, 2017

Inventors: Aaftab Munshi, Nathaniel Begeman
DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20170039092

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Application

Filed: August 12, 2016

Publication date: February 9, 2017

Inventors: Aaftab Munshi, Jeremy Sandmel
PARALLEL RUNTIME EXECTION ON MULTIPLE PROCESSORS

Publication number: 20170031691

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices.

Type: Application

Filed: August 11, 2016

Publication date: February 2, 2017

Inventors: Aaftab Munshi, Jeremy Sandmel
Application programming interfaces for data parallel computing on multiple processors

Patent number: 9477525

Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.

Type: Grant

Filed: December 21, 2015

Date of Patent: October 25, 2016

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Nathaniel Begeman

prev 1 2 3 4 5 6 next