Patents by Inventor Aaftab Munshi

Aaftab Munshi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Execution Circuitry for Floating-Point Power Operation

Publication number: 20240094989

Abstract: Techniques are disclosed relating to dedicated power function circuitry for a floating-point power instruction. In some embodiments, execution circuitry is configured to execute a floating-point power instruction to evaluate the power function xy as 2y log2x. In some embodiments, base-2 logarithm circuitry is configured to evaluate a base-2 logarithm for a first input (e.g., log2 x) by determining coefficients for a polynomial function and evaluating the polynomial function using the determined coefficients and the first input. In some embodiments, multiplication circuitry multiplies the base-2 logarithm result by a second input to generate a multiplication result. In some embodiments, base-2 power function circuitry is configured to evaluate a base-2 power function for the multiplication result. Disclosed techniques may advantageously increase performance and reduce power consumption of floating-point power function operations with reasonable area and accuracy, relative to traditional techniques.

Type: Application

Filed: October 11, 2022

Publication date: March 21, 2024

Inventors: Ali Sazegari, Segev Elmalem, O-Cheng Chang, Jingwei Zhang, Ido Soffair, Aaftab A. Munshi
Parallel runtime execution on multiple processors

Patent number: 11836506

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

Type: Grant

Filed: December 29, 2022

Date of Patent: December 5, 2023

Assignee: APPLE INC.

Inventors: Aaftab Munshi, Jeremy Sandmel
PARALLEL RUNTIME EXECUTION ON MULTIPLE PROCESSORS

Publication number: 20230185583

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

Type: Application

Filed: December 29, 2022

Publication date: June 15, 2023

Inventors: Aaftab Munshi, Jeremy Sandmel
Parallel runtime execution on multiple processors

Patent number: 11544075

Abstract: A method and an apparatus that schedule a plurality of executables in a schedule queue for execution in one or more physical compute devices such as CPUs or GPUs concurrently are described. One or more executables are compiled online from a source having an existing executable for a type of physical compute devices different from the one or more physical compute devices. Dependency relations among elements corresponding to scheduled executables are determined to select an executable to be executed by a plurality of threads concurrently in more than one of the physical compute devices. A thread initialized for executing an executable in a GPU of the physical compute devices are initialized for execution in another CPU of the physical compute devices if the GPU is busy with graphics processing threads.

Type: Grant

Filed: August 11, 2016

Date of Patent: January 3, 2023

Assignee: APPLE INC.

Inventors: Aaftab Munshi, Jeremy Sandmel
Data parallel computing on multiple processors

Patent number: 11237876

Abstract: A method and an apparatus that allocate one or more physical compute devices such as Central Processing Units (CPUs) or Graphical Processing Units (GPUs) attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Grant

Filed: February 3, 2020

Date of Patent: February 1, 2022

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
MULTI-PROCESSOR TRAINING OF NEURAL NETWORKS

Publication number: 20210397957

Abstract: The subject technology provides a framework for multi-processor training of neural networks. Multi-processor training of neural networks can include performing a forward pass of a training iteration using a neural processor, and performing a backward pass of the training iteration using a CPU or a GPU. Additional operations for facilitating the multi-processor training are disclosed.

Type: Application

Filed: June 16, 2021

Publication date: December 23, 2021

Inventors: Umesh S. VAISHAMPAYAN, Kit-Man WAN, Aaftab A. MUNSHI, Cecile M. FORET, Yen-Fu LIU
Application interface on multiple processors

Patent number: 11106504

Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.

Type: Grant

Filed: January 13, 2020

Date of Patent: August 31, 2021

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
Enqueuing kernels from kernels on GPU/CPU

Patent number: 10956218

Abstract: Graphics processing units (GPUs) and other compute units are allowed to enqueue tasks for themselves by themselves, without needing a host processor to queue the work for the GPU. Built-in functions enable kernels to enqueue kernels for execution on a device. In some embodiments, ndrange kernels execute over an N-dimensional range to provide data-parallel operations. Task kernels provide task-parallel operations. In some embodiments, kernels may be defined using clang block syntax. The order of execution of commands on a compute unit may be constrained or allow execution of commands out-of-order. Compute units may control when kernels enqueued by the compute unit begins execution.

Type: Grant

Filed: November 25, 2019

Date of Patent: March 23, 2021

Assignee: Apple Inc.

Inventor: Aaftab A. Munshi
APPLICATION INTERFACE ON MULTIPLE PROCESSORS

Publication number: 20200285521

Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.

Type: Application

Filed: January 13, 2020

Publication date: September 10, 2020

Inventors: Aaftab Munshi, Jeremy Sandmel
Language, function library, and compiler for graphical and non-graphical computation on a graphical processor unit

Patent number: 10747519

Abstract: A compiler and library provide the ability to compile a programming language according to a defined language model into a programming language independent, machine independent intermediate representation, for conversion into an executable on a target programmable device. The language model allows writing programs that perform data-parallel graphics and non-graphics tasks.

Type: Grant

Filed: August 6, 2019

Date of Patent: August 18, 2020

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Kenneth C. Dyke, Rahul U. Joshi, Richard W. Schreyer
DATA PARALLEL COMPUTING ON MULTIPLE PROCESSORS

Publication number: 20200250005

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs or GPUs attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Application

Filed: February 3, 2020

Publication date: August 6, 2020

Inventors: Aaftab Munshi, Jeremy Sandmel
Enqueuing Kernels from Kernels on GPU/CPU

Publication number: 20200167200

Abstract: Graphics processing units (GPUs) and other compute units are allowed to enqueue tasks for themselves by themselves, without needing a host processor to queue the work for the GPU. Built-in functions enable kernels to enqueue kernels for execution on a device. In some embodiments, ndrange kernels execute over an N-dimensional range to provide data-parallel operations. Task kernels provide task-parallel operations. In some embodiments, kernels may be defined using clang block syntax. The order of execution of commands on a compute unit may be constrained or allow execution of commands out-of-order. Compute units may control when kernels enqueued by the compute unit begins execution.

Type: Application

Filed: November 25, 2019

Publication date: May 28, 2020

Inventor: Aaftab A. Munshi
Data parallel computing on multiple processors

Patent number: 10552226

Abstract: A method and an apparatus that allocate one or more physical compute devices such as CPUs (Central Processing Unit) or GPUs (Graphics Processing Unit) attached to a host processing unit running an application for executing one or more threads of the application are described. The allocation may be based on data representing a processing capability requirement from the application for executing an executable in the one or more threads. A compute device identifier may be associated with the allocated physical compute devices to schedule and execute the executable in the one or more threads concurrently in one or more of the allocated physical compute devices concurrently.

Type: Grant

Filed: December 11, 2017

Date of Patent: February 4, 2020

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Jeremy Sandmel
Application interface on multiple processors

Patent number: 10534647

Abstract: A method and an apparatus that execute a parallel computing program in a programming language for a parallel computing architecture are described. The parallel computing program is stored in memory in a system with parallel processors. The parallel computing program is stored in a memory to allocate threads between a host processor and a GPU. The programming language includes an API to allow an application to make calls using the API to allocate execution of the threads between the host processor and the GPU. The programming language includes host function data tokens for host functions performed in the host processor and kernel function data tokens for compute kernel functions performed in one or more compute processors, e.g., GPUs or CPUs, separate from the host processor.

Type: Grant

Filed: September 7, 2017

Date of Patent: January 14, 2020

Assignee: APPLE INC.

Inventors: Aaftab Munshi, Jeremy Sandmel
Language, Function Library, And Compiler For Graphical And Non-Graphical Computation On A Graphical Processor Unit

Publication number: 20190361687

Abstract: A compiler and library provide the ability to compile a programming language according to a defined language model into a programming language independent, machine independent intermediate representation, for conversion into an executable on a target programmable device. The language model allows writing programs that perform data-parallel graphics and non-graphics tasks.

Type: Application

Filed: August 6, 2019

Publication date: November 28, 2019

Inventors: Aaftab A. Munshi, Kenneth C. Dyke, Rahul U. Joshi, Richard W. Schreyer
Enqueuing kernels from kernels on GPU/CPU

Patent number: 10489205

Abstract: Graphics processing units (GPUs) and other compute units are allowed to enqueue tasks for themselves by themselves, without needing a host processor to queue the work for the GPU. Built-in functions enable kernels to enqueue kernels for execution on a device. In some embodiments, ndrange kernels execute over an N-dimensional range to provide data-parallel operations. Task kernels provide task-parallel operations. In some embodiments, kernels may be defined using clang block syntax. The order of execution of commands on a compute unit may be constrained or allow execution of commands out-of-order. Compute units may control when kernels enqueued by the compute unit begins execution.

Type: Grant

Filed: December 23, 2013

Date of Patent: November 26, 2019

Assignee: Apple Inc.

Inventor: Aaftab A. Munshi
Local image blocks for graphics processing

Patent number: 10445852

Abstract: Techniques are disclosed relating to a hardware-supported flexible data structure for graphics processing. In some embodiments, dimensions of the data structure are configurable in an X direction, a Y direction, a number of samples per pixel, and an amount of data per sample. In some embodiments, these attributes are configurable using hardware registers. In some embodiments, the data structure is persistent across a tile being processed such that local memory context is accessible to both rendering threads of a render pass and mid-render compute threads.

Type: Grant

Filed: December 22, 2016

Date of Patent: October 15, 2019

Assignee: Apple Inc.

Inventors: Terence M. Potter, Robert Kenney, Aaftab A. Munshi, Justin A. Hensley, Richard W. Schreyer
Language, function library, and compiler for graphical and non-graphical computation on a graphical processor unit

Patent number: 10430169

Abstract: A compiler and library provide the ability to compile a programming language according to a defined language model into a programming language independent, machine independent intermediate representation, for conversion into an executable on a target programmable device. The language model allows writing programs that perform data-parallel graphics and non-graphics tasks.

Type: Grant

Filed: February 20, 2015

Date of Patent: October 1, 2019

Assignee: Apple Inc.

Inventors: Aaftab A. Munshi, Kenneth C. Dyke, Rahul U. Joshi, Richard W. Schreyer
Unified intermediate representation

Patent number: 10372431

Abstract: A system decouples the source code language from the eventual execution environment by compiling the source code language into a unified intermediate representation that conforms to a language model allowing both parallel graphical operations and parallel general-purpose computational operations. The intermediate representation may then be distributed to end-user computers, where an embedded compiler can compile the intermediate representation into an executable binary targeted for the CPUs and GPUs available in that end-user device. The intermediate representation is sufficient to define both graphics and non-graphics compute kernels and shaders. At install-time or later, the intermediate representation file may be compiled for the specific target hardware of the given end-user computing system.

Type: Grant

Filed: June 28, 2017

Date of Patent: August 6, 2019

Assignee: Apple Inc.

Inventors: Aaftab Munshi, Rahul U. Joshi, Mon P. Wang, Kelvin C. Chiu
Memory consistency in graphics memory hierarchy with relaxed ordering

Patent number: 10324844

Abstract: Techniques are disclosed relating to memory consistency in a memory hierarchy with relaxed ordering. In some embodiments, an apparatus includes a first level cache that is shared by a plurality of shader processing elements and a second level cache that is shared by the shader processing elements and at least a texture processing unit. In some embodiments, the apparatus is configured to execute operations specified by graphics instructions that include (1) an attribute of the operation that specifies a type of memory consistency to be imposed for the operation and (2) scope information for the attribute that specifies whether the memory consistency specified by the attribute should be enforced at the first level cache or the second level cache. In some embodiments, the apparatus is configured to determine whether to sequence memory accesses at the first level cache and the second level cache based on the attribute and the scope.

Type: Grant

Filed: December 22, 2016

Date of Patent: June 18, 2019

Assignee: Apple Inc.

Inventors: Anthony P. DeLaurier, Owen C. Anderson, Michael J. Swift, Aaftab A. Munshi, Terence M. Potter

1 2 3 4 5 … next