Patents by Inventor Aaftab Munshi
Aaftab Munshi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11836506Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.Type: GrantFiled: December 29, 2022Date of Patent: December 5, 2023Assignee: APPLE INC.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20230185583Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.Type: ApplicationFiled: December 29, 2022Publication date: June 15, 2023Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 11544075Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.Type: GrantFiled: August 11, 2016Date of Patent: January 3, 2023Assignee: APPLE INC.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 11237876Abstract: A method and an apparatus that allocate one or more physical compute devices such as Central Processing Units (CPUs) or Graphical Processing Units (GPUs) attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: GrantFiled: February 3, 2020Date of Patent: February 1, 2022Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 11106504Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.Type: GrantFiled: January 13, 2020Date of Patent: August 31, 2021Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20200285521Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.Type: ApplicationFiled: January 13, 2020Publication date: September 10, 2020Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20200250005Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: ApplicationFiled: February 3, 2020Publication date: August 6, 2020Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 10552226Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs (Central Processing Unit) or GPUs (Graphics Processing Unit) attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: GrantFiled: December 11, 2017Date of Patent: February 4, 2020Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 10534647Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.Type: GrantFiled: September 7, 2017Date of Patent: January 14, 2020Assignee: APPLE INC.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 10372431Abstract: A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system.Type: GrantFiled: June 28, 2017Date of Patent: August 6, 2019Assignee: Apple Inc.Inventors: Aaftab Munshi, Rahul U. Joshi, Mon P. Wang, Kelvin C. Chiu
-
Patent number: 10067797Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.Type: GrantFiled: September 26, 2016Date of Patent: September 4, 2018Assignee: Apple Inc.Inventors: Aaftab Munshi, Nathaniel Begeman
-
Publication number: 20180203737Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: ApplicationFiled: December 11, 2017Publication date: July 19, 2018Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 9858122Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs (Central Processing Units) or GPUs (Graphical Processing Units) attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: GrantFiled: August 12, 2016Date of Patent: January 2, 2018Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20170371714Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.Type: ApplicationFiled: September 7, 2017Publication date: December 28, 2017Inventors: AAFTAB MUNSHI, JEREMY SANDMEL
-
Publication number: 20170308364Abstract: A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system.Type: ApplicationFiled: June 28, 2017Publication date: October 26, 2017Inventors: Aaftab Munshi, Rahul U. Joshi, Mon P. Wang, Kelvin C. Chiu
-
Patent number: 9766938Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The system includes a host processor, a graphics processing unit (GPU) coupled to the host processor and a memory coupled to at least one of the host processor and the GPU. The parallel computing program is stored in the memory to allocate threads between the host processor and the GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g. GPUs or CPUs, separate from the host processor.Type: GrantFiled: January 27, 2016Date of Patent: September 19, 2017Assignee: Apple Inc.Inventors: Aaftab Munshi, Jeremy Sandmel
-
Patent number: 9740464Abstract: A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system.Type: GrantFiled: September 30, 2014Date of Patent: August 22, 2017Assignee: Apple Inc.Inventors: Aaftab Munshi, Rahul U. Joshi, Mon P. Wang, Kelvin C. Chiu
-
Publication number: 20170075730Abstract: A method and an apparatus for a parallel computing program calling APIs (application programming interfaces) in a host processor to perform a data processing task in parallel among compute units are described. The compute units are coupled to the host processor including central processing units (CPUs) and graphic processing units (GPUs). A program object corresponding to a source code for the data processing task is generated in a memory coupled to the host processor according to the API calls. Executable codes for the compute units are generated from the program object according to the API calls to be loaded for concurrent execution among the compute units to perform the data processing task.Type: ApplicationFiled: September 26, 2016Publication date: March 16, 2017Inventors: Aaftab Munshi, Nathaniel Begeman
-
Publication number: 20170039092Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.Type: ApplicationFiled: August 12, 2016Publication date: February 9, 2017Inventors: Aaftab Munshi, Jeremy Sandmel
-
Publication number: 20170031691Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices.Type: ApplicationFiled: August 11, 2016Publication date: February 2, 2017Inventors: Aaftab Munshi, Jeremy Sandmel