Patents by Inventor Benjamin Thomas Sander

Benjamin Thomas Sander has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods for increasing cache hit rates for neural networks

Patent number: 12265908

Abstract: Systems, apparatuses, and methods for achieving higher cache hit rates for machine learning models are disclosed. When a processor executes a given layer of a machine learning model, the processor generates and stores activation data in a cache subsystem a forward or reverse manner. Typically, the entirety of the activation data does not fit in the cache subsystem. The processor records the order in which activation data is generated for the given layer. Next, when the processor initiates execution of a subsequent layer of the machine learning model, the processor processes the previous layer's activation data in a reverse order from how the activation data was generated. In this way, the processor alternates how the layers of the machine learning model process data by either starting from the front end or starting from the back end of the array.

Type: Grant

Filed: August 31, 2020

Date of Patent: April 1, 2025

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Benjamin Thomas Sander, Swapnil Sakharshete, Ashish Panday
OPTIMIZING LOW PRECISION AND SPARSITY INFERENCE WITHOUT RETRAINING

Publication number: 20240211762

Abstract: An apparatus and method for efficiently creating less computationally intensive nodes for a neural network. In various implementations, a computing system includes a processor and a memory with circuitry that stores multiple input data values to process during inference of a trained neural network. The processor determines, during inference, which node input values, node intermediate values, and node output values of the trained neural network to represent in a respective one of multiple available floating-point formats with less precision. No retraining is performed, but rather, the updates to the representations occur during inference. The processor uses selection criteria to reduce the amount of computation involved for updating the representations during inference while maintaining accuracy above an accuracy threshold. To do so, the processor uses the selection criteria to reduce the number of layers, the number of nodes within a layer, and the number of weight values per node to inspect.

Type: Application

Filed: December 27, 2022

Publication date: June 27, 2024

Inventors: Adam H. Li, Eric F. Dellinger, Philip Bryn James-Roxby, Shomy Sanyal, Benjamin Thomas Sander, Ralph Detlef Wittig
ARCHITECTED LIBRARY INTERFACE FOR KERNEL FUSION

Publication number: 20220092410

Abstract: Systems, apparatuses, and methods for implementing an architected library interface for kernel fusion are disclosed. A processor receives a first representation of a neural network and a vendor-supplied library. The vendor-supplied library is associated with a specific hardware target, and the library includes fusing points which allow a kernel to be called within an optimized operation. When a kernel is called using the fusing point within an optimized operation, the kernel performs one or more operations on the data being processed by the optimized operation. This allows multiple kernels to be executed without having to write data back to memory after each individual kernel. The processor generates an optimized version of the neural network by linking to fusing points within the vendor-supplied library. This reduces the number of memory accesses and increases the performance of the optimized version of the neural network when executed on the hardware target.

Type: Application

Filed: September 24, 2020

Publication date: March 24, 2022

Inventor: Benjamin Thomas Sander
METHODS FOR INCREASING CACHE HIT RATES FOR NEURAL NETWORKS

Publication number: 20220067508

Abstract: Systems, apparatuses, and methods for achieving higher cache hit rates for machine learning models are disclosed. When a processor executes a given layer of a machine learning model, the processor generates and stores activation data in a cache subsystem a forward or reverse manner. Typically, the entirety of the activation data does not fit in the cache subsystem. The processor records the order in which activation data is generated for the given layer. Next, when the processor initiates execution of a subsequent layer of the machine learning model, the processor processes the previous layer's activation data in a reverse order from how the activation data was generated. In this way, the processor alternates how the layers of the machine learning model process data by either starting from the front end or starting from the back end of the array.

Type: Application

Filed: August 31, 2020

Publication date: March 3, 2022

Inventors: Benjamin Thomas Sander, Swapnil Sakharshete, Ashish Panday
System call queue between visible and invisible computing devices

Patent number: 10255104

Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.

Type: Grant

Filed: March 29, 2013

Date of Patent: April 9, 2019

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
Heterogeneous enqueuing and dequeuing mechanism for task scheduling

Patent number: 10146575

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

Type: Grant

Filed: August 29, 2016

Date of Patent: December 4, 2018

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Dynamic work partitioning on heterogeneous processing devices

Patent number: 9645854

Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices. The method comprising accessing a memory storage of a processor of one type by a dequeuing entity associated with a processor of a different type, identifying a task from a plurality of tasks within the memory that can be processed by the processor of the different type, synchronizing a plurality of dequeuing entities capable of accessing the memory storage, and dequeuing the task form the memory storage.

Type: Grant

Filed: November 2, 2011

Date of Patent: May 9, 2017

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
HETEROGENEOUS ENQUEUING AND DEQUEUING MECHANISM FOR TASK SCHEDULING

Publication number: 20160371116

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

Type: Application

Filed: August 29, 2016

Publication date: December 22, 2016

Applicant: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Heterogeneous enqueuing and dequeuing mechanism for task scheduling

Patent number: 9430281

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.

Type: Grant

Filed: November 9, 2011

Date of Patent: August 30, 2016

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Optimizing communication of system call requests

Patent number: 8752064

Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.

Type: Grant

Filed: November 30, 2011

Date of Patent: June 10, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
Computer system interrupt handling

Patent number: 8667201

Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.

Type: Grant

Filed: November 9, 2011

Date of Patent: March 4, 2014

Assignee: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
System Call Queue Between Visible and Invisible Computing Devices

Publication number: 20130263144

Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.

Type: Application

Filed: March 29, 2013

Publication date: October 3, 2013

Applicant: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
Task Scheduling

Publication number: 20120194526

Abstract: Systems, methods, and articles of manufacture for optimizing task scheduling on an accelerated processing device (APD) device are provided. In an embodiment, a method comprises: enqueuing, using the APD, one or more tasks in a memory storage; and dequeuing, using the APD, the one or more tasks from the memory storage using a hardware-based command processor, wherein the command processor forwards the one or more tasks to a shader core.

Type: Application

Filed: November 30, 2011

Publication date: August 2, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Dynamic Work Partitioning on Heterogeneous Processing Devices

Publication number: 20120192201

Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices.

Type: Application

Filed: November 2, 2011

Publication date: July 26, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Computer System Interrupt Handling

Publication number: 20120179851

Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.

Type: Application

Filed: November 9, 2011

Publication date: July 12, 2012

Applicant: Advanced Micro Devices, Inc.

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Optimizing Communication of System Call Requests

Publication number: 20120180072

Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.

Type: Application

Filed: November 30, 2011

Publication date: July 12, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
Heterogeneous Enqueuinig and Dequeuing Mechanism for Task Scheduling

Publication number: 20120180056

Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.

Type: Application

Filed: November 9, 2011

Publication date: July 12, 2012

Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
System and method for modifying a load operation to include a register-to-register move operation in order to forward speculative load results to a dependent operation

Patent number: 7222226

Abstract: A system may include a dispatch unit, a scheduler, and an execution core. The dispatch unit may be configured to modify a load operation to include a register-to-register move operation in response to an indication that a speculative result of the load operation is linked to a data value identified by a first tag. The scheduler may be coupled to the dispatch unit and configured to issue the register-to-register move operation in response to availability of the data value. The execution core may be configured to execute the register-to-register move operation by outputting the data value and a tag indicating that the data value is the result of the load operation.

Type: Grant

Filed: April 30, 2002

Date of Patent: May 22, 2007

Assignee: Advanced Micro Devices, Inc.

Inventors: Kevin Michael Lepak, Benjamin Thomas Sander, James K. Pickett
Data speculation based on stack-relative addressing patterns

Patent number: 7089400

Abstract: A processor may include a stack file and an execution core. The stack file may include an entry configured to store an addressing pattern and a tag. The addressing pattern identifies a memory location within the stack area of memory. The stack file may be configured to link a data value identified by the tag stored in the entry to the speculative result of a memory operation if the addressing pattern of the memory operation matches the addressing pattern stored in the entry. The execution core may be configured to access the speculative result when executing another operation that is dependent on the memory operation.

Type: Grant

Filed: January 21, 2003

Date of Patent: August 8, 2006

Assignee: Advanced Micro Devices, Inc.

Inventors: James K. Pickett, Benjamin Thomas Sander, Kevin Michael Lepak
Data speculation based on addressing patterns identifying dual-purpose register

Patent number: 7024537

Abstract: A system may include a memory file and an execution core. The memory file may include an entry configured to store an addressing pattern and a tag. If an addressing pattern of a memory operation matches the addressing pattern stored in the entry, the memory file may be configured to link a data value identified by the tag to a speculative result of the memory operation. The addressing pattern of the memory operation includes an identifier of a logical register, and the memory file may be configured to predict whether the logical register is being specified as a general purpose register or a stack frame pointer register in order to determine whether the addressing pattern of the memory operation matches the addressing pattern stored in the entry. The execution core may be configured to access the speculative result when executing another operation that is dependent on the memory operation.

Type: Grant

Filed: January 21, 2003

Date of Patent: April 4, 2006

Assignee: Advanced Micro Devices, Inc.

Inventors: James K. Pickett, Benjamin Thomas Sander, Kevin Michael Lepak

1 2 next