Patents by Inventor Benjamin Thomas Sander

Benjamin Thomas Sander has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240211762
    Abstract: An apparatus and method for efficiently creating less computationally intensive nodes for a neural network. In various implementations, a computing system includes a processor and a memory with circuitry that stores multiple input data values to process during inference of a trained neural network. The processor determines, during inference, which node input values, node intermediate values, and node output values of the trained neural network to represent in a respective one of multiple available floating-point formats with less precision. No retraining is performed, but rather, the updates to the representations occur during inference. The processor uses selection criteria to reduce the amount of computation involved for updating the representations during inference while maintaining accuracy above an accuracy threshold. To do so, the processor uses the selection criteria to reduce the number of layers, the number of nodes within a layer, and the number of weight values per node to inspect.
    Type: Application
    Filed: December 27, 2022
    Publication date: June 27, 2024
    Inventors: Adam H. Li, Eric F. Dellinger, Philip Bryn James-Roxby, Shomy Sanyal, Benjamin Thomas Sander, Ralph Detlef Wittig
  • Publication number: 20220092410
    Abstract: Systems, apparatuses, and methods for implementing an architected library interface for kernel fusion are disclosed. A processor receives a first representation of a neural network and a vendor-supplied library. The vendor-supplied library is associated with a specific hardware target, and the library includes fusing points which allow a kernel to be called within an optimized operation. When a kernel is called using the fusing point within an optimized operation, the kernel performs one or more operations on the data being processed by the optimized operation. This allows multiple kernels to be executed without having to write data back to memory after each individual kernel. The processor generates an optimized version of the neural network by linking to fusing points within the vendor-supplied library. This reduces the number of memory accesses and increases the performance of the optimized version of the neural network when executed on the hardware target.
    Type: Application
    Filed: September 24, 2020
    Publication date: March 24, 2022
    Inventor: Benjamin Thomas Sander
  • Publication number: 20220067508
    Abstract: Systems, apparatuses, and methods for achieving higher cache hit rates for machine learning models are disclosed. When a processor executes a given layer of a machine learning model, the processor generates and stores activation data in a cache subsystem a forward or reverse manner. Typically, the entirety of the activation data does not fit in the cache subsystem. The processor records the order in which activation data is generated for the given layer. Next, when the processor initiates execution of a subsequent layer of the machine learning model, the processor processes the previous layer's activation data in a reverse order from how the activation data was generated. In this way, the processor alternates how the layers of the machine learning model process data by either starting from the front end or starting from the back end of the array.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 3, 2022
    Inventors: Benjamin Thomas Sander, Swapnil Sakharshete, Ashish Panday
  • Patent number: 10255104
    Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.
    Type: Grant
    Filed: March 29, 2013
    Date of Patent: April 9, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
  • Patent number: 10146575
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.
    Type: Grant
    Filed: August 29, 2016
    Date of Patent: December 4, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 9645854
    Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices. The method comprising accessing a memory storage of a processor of one type by a dequeuing entity associated with a processor of a different type, identifying a task from a plurality of tasks within the memory that can be processed by the processor of the different type, synchronizing a plurality of dequeuing entities capable of accessing the memory storage, and dequeuing the task form the memory storage.
    Type: Grant
    Filed: November 2, 2011
    Date of Patent: May 9, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20160371116
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.
    Type: Application
    Filed: August 29, 2016
    Publication date: December 22, 2016
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 9430281
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: August 30, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 8752064
    Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.
    Type: Grant
    Filed: November 30, 2011
    Date of Patent: June 10, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 8667201
    Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.
    Type: Grant
    Filed: November 9, 2011
    Date of Patent: March 4, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20130263144
    Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.
    Type: Application
    Filed: March 29, 2013
    Publication date: October 3, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
  • Publication number: 20120194526
    Abstract: Systems, methods, and articles of manufacture for optimizing task scheduling on an accelerated processing device (APD) device are provided. In an embodiment, a method comprises: enqueuing, using the APD, one or more tasks in a memory storage; and dequeuing, using the APD, the one or more tasks from the memory storage using a hardware-based command processor, wherein the command processor forwards the one or more tasks to a shader core.
    Type: Application
    Filed: November 30, 2011
    Publication date: August 2, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120192201
    Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices.
    Type: Application
    Filed: November 2, 2011
    Publication date: July 26, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120179851
    Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.
    Type: Application
    Filed: November 9, 2011
    Publication date: July 12, 2012
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120180072
    Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.
    Type: Application
    Filed: November 30, 2011
    Publication date: July 12, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Publication number: 20120180056
    Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.
    Type: Application
    Filed: November 9, 2011
    Publication date: July 12, 2012
    Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
  • Patent number: 7222226
    Abstract: A system may include a dispatch unit, a scheduler, and an execution core. The dispatch unit may be configured to modify a load operation to include a register-to-register move operation in response to an indication that a speculative result of the load operation is linked to a data value identified by a first tag. The scheduler may be coupled to the dispatch unit and configured to issue the register-to-register move operation in response to availability of the data value. The execution core may be configured to execute the register-to-register move operation by outputting the data value and a tag indicating that the data value is the result of the load operation.
    Type: Grant
    Filed: April 30, 2002
    Date of Patent: May 22, 2007
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kevin Michael Lepak, Benjamin Thomas Sander, James K. Pickett
  • Patent number: 7089400
    Abstract: A processor may include a stack file and an execution core. The stack file may include an entry configured to store an addressing pattern and a tag. The addressing pattern identifies a memory location within the stack area of memory. The stack file may be configured to link a data value identified by the tag stored in the entry to the speculative result of a memory operation if the addressing pattern of the memory operation matches the addressing pattern stored in the entry. The execution core may be configured to access the speculative result when executing another operation that is dependent on the memory operation.
    Type: Grant
    Filed: January 21, 2003
    Date of Patent: August 8, 2006
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James K. Pickett, Benjamin Thomas Sander, Kevin Michael Lepak
  • Patent number: 7024537
    Abstract: A system may include a memory file and an execution core. The memory file may include an entry configured to store an addressing pattern and a tag. If an addressing pattern of a memory operation matches the addressing pattern stored in the entry, the memory file may be configured to link a data value identified by the tag to a speculative result of the memory operation. The addressing pattern of the memory operation includes an identifier of a logical register, and the memory file may be configured to predict whether the logical register is being specified as a general purpose register or a stack frame pointer register in order to determine whether the addressing pattern of the memory operation matches the addressing pattern stored in the entry. The execution core may be configured to access the speculative result when executing another operation that is dependent on the memory operation.
    Type: Grant
    Filed: January 21, 2003
    Date of Patent: April 4, 2006
    Assignee: Advanced Micro Devices, Inc.
    Inventors: James K. Pickett, Benjamin Thomas Sander, Kevin Michael Lepak
  • Patent number: 6981119
    Abstract: A memory system may use the storage space freed by compressing a unit of data to store performance-enhancing data associated with that unit of data. For example, a memory controller may be configured to allocate several of storage locations within a memory to store a unit of data. If the unit of data is compressed, the unit of data may not occupy a portion of the storage locations allocated to it. The memory controller may store performance-enhancing data associated with the unit of data in the portion of the storage locations allocated to but not occupied by the first unit of data.
    Type: Grant
    Filed: August 29, 2002
    Date of Patent: December 27, 2005
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kevin Michael Lepak, Benjamin Thomas Sander