Patents by Inventor Benjamin Thomas Sander
Benjamin Thomas Sander has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240211762Abstract: An apparatus and method for efficiently creating less computationally intensive nodes for a neural network. In various implementations, a computing system includes a processor and a memory with circuitry that stores multiple input data values to process during inference of a trained neural network. The processor determines, during inference, which node input values, node intermediate values, and node output values of the trained neural network to represent in a respective one of multiple available floating-point formats with less precision. No retraining is performed, but rather, the updates to the representations occur during inference. The processor uses selection criteria to reduce the amount of computation involved for updating the representations during inference while maintaining accuracy above an accuracy threshold. To do so, the processor uses the selection criteria to reduce the number of layers, the number of nodes within a layer, and the number of weight values per node to inspect.Type: ApplicationFiled: December 27, 2022Publication date: June 27, 2024Inventors: Adam H. Li, Eric F. Dellinger, Philip Bryn James-Roxby, Shomy Sanyal, Benjamin Thomas Sander, Ralph Detlef Wittig
-
Publication number: 20220092410Abstract: Systems, apparatuses, and methods for implementing an architected library interface for kernel fusion are disclosed. A processor receives a first representation of a neural network and a vendor-supplied library. The vendor-supplied library is associated with a specific hardware target, and the library includes fusing points which allow a kernel to be called within an optimized operation. When a kernel is called using the fusing point within an optimized operation, the kernel performs one or more operations on the data being processed by the optimized operation. This allows multiple kernels to be executed without having to write data back to memory after each individual kernel. The processor generates an optimized version of the neural network by linking to fusing points within the vendor-supplied library. This reduces the number of memory accesses and increases the performance of the optimized version of the neural network when executed on the hardware target.Type: ApplicationFiled: September 24, 2020Publication date: March 24, 2022Inventor: Benjamin Thomas Sander
-
Publication number: 20220067508Abstract: Systems, apparatuses, and methods for achieving higher cache hit rates for machine learning models are disclosed. When a processor executes a given layer of a machine learning model, the processor generates and stores activation data in a cache subsystem a forward or reverse manner. Typically, the entirety of the activation data does not fit in the cache subsystem. The processor records the order in which activation data is generated for the given layer. Next, when the processor initiates execution of a subsequent layer of the machine learning model, the processor processes the previous layer's activation data in a reverse order from how the activation data was generated. In this way, the processor alternates how the layers of the machine learning model process data by either starting from the front end or starting from the back end of the array.Type: ApplicationFiled: August 31, 2020Publication date: March 3, 2022Inventors: Benjamin Thomas Sander, Swapnil Sakharshete, Ashish Panday
-
Patent number: 10255104Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.Type: GrantFiled: March 29, 2013Date of Patent: April 9, 2019Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
-
Patent number: 10146575Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.Type: GrantFiled: August 29, 2016Date of Patent: December 4, 2018Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 9645854Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices. The method comprising accessing a memory storage of a processor of one type by a dequeuing entity associated with a processor of a different type, identifying a task from a plurality of tasks within the memory that can be processed by the processor of the different type, synchronizing a plurality of dequeuing entities capable of accessing the memory storage, and dequeuing the task form the memory storage.Type: GrantFiled: November 2, 2011Date of Patent: May 9, 2017Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20160371116Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.Type: ApplicationFiled: August 29, 2016Publication date: December 22, 2016Applicant: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 9430281Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader core.Type: GrantFiled: November 9, 2011Date of Patent: August 30, 2016Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 8752064Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.Type: GrantFiled: November 30, 2011Date of Patent: June 10, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 8667201Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.Type: GrantFiled: November 9, 2011Date of Patent: March 4, 2014Assignee: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20130263144Abstract: Embodiments described herein include a system, a computer-readable medium and a computer-implemented method for processing a system call (SYSCALL) request. The SYSCALL request from an invisible processing device is stored in a queueing mechanism that is accessible to a visible processing device, where the visible processing device is visible to an operating system and the invisible processing device is invisible to the operating system. The SYSCALL request is processed using the visible processing device, and the invisible processing device is notified using a notification mechanism that the SYSCALL request was processed.Type: ApplicationFiled: March 29, 2013Publication date: October 3, 2013Applicant: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas Sander, Michael Clair Houston, Keith Lowery, Newton Cheung
-
Publication number: 20120194526Abstract: Systems, methods, and articles of manufacture for optimizing task scheduling on an accelerated processing device (APD) device are provided. In an embodiment, a method comprises: enqueuing, using the APD, one or more tasks in a memory storage; and dequeuing, using the APD, the one or more tasks from the memory storage using a hardware-based command processor, wherein the command processor forwards the one or more tasks to a shader core.Type: ApplicationFiled: November 30, 2011Publication date: August 2, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120192201Abstract: A method, system and article of manufacture for balancing a workload on heterogeneous processing devices.Type: ApplicationFiled: November 2, 2011Publication date: July 26, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120179851Abstract: A system, method and article of manufacture for an accelerated processing device (APD) to request a central processing unit (CPU) to process a task, comprising enqueuing a plurality of tasks on a queue using the APD, generating a user-level interrupt and transmitting to the CPU the plurality of tasks in the queue using an interrupt handler associated with a CPU thread.Type: ApplicationFiled: November 9, 2011Publication date: July 12, 2012Applicant: Advanced Micro Devices, Inc.Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120180072Abstract: Provided herein is a method for optimizing communication for system calls. The method includes storing a system call for each work item in a wavefront and transmitting said stored system calls to a processor for execution. The method also includes receiving a result to each work item in the wavefront responsive to said transmitting.Type: ApplicationFiled: November 30, 2011Publication date: July 12, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Publication number: 20120180056Abstract: Methods, systems and computer-readable mediums for task scheduling on an accelerated processing device (APD) are provided. In an embodiment, a method comprises: enqueuing one or more tasks in a memory storage module based on the APD; using a software-based enqueuing module; and dequeuing the one or more tasks from the memory storage module using a hardware-based command processor, wherein the command processor forwards the one or more tasks to the shader cote.Type: ApplicationFiled: November 9, 2011Publication date: July 12, 2012Inventors: Benjamin Thomas SANDER, Michael Houston, Newton Cheung, Keith Lowery
-
Patent number: 7222226Abstract: A system may include a dispatch unit, a scheduler, and an execution core. The dispatch unit may be configured to modify a load operation to include a register-to-register move operation in response to an indication that a speculative result of the load operation is linked to a data value identified by a first tag. The scheduler may be coupled to the dispatch unit and configured to issue the register-to-register move operation in response to availability of the data value. The execution core may be configured to execute the register-to-register move operation by outputting the data value and a tag indicating that the data value is the result of the load operation.Type: GrantFiled: April 30, 2002Date of Patent: May 22, 2007Assignee: Advanced Micro Devices, Inc.Inventors: Kevin Michael Lepak, Benjamin Thomas Sander, James K. Pickett
-
Patent number: 7089400Abstract: A processor may include a stack file and an execution core. The stack file may include an entry configured to store an addressing pattern and a tag. The addressing pattern identifies a memory location within the stack area of memory. The stack file may be configured to link a data value identified by the tag stored in the entry to the speculative result of a memory operation if the addressing pattern of the memory operation matches the addressing pattern stored in the entry. The execution core may be configured to access the speculative result when executing another operation that is dependent on the memory operation.Type: GrantFiled: January 21, 2003Date of Patent: August 8, 2006Assignee: Advanced Micro Devices, Inc.Inventors: James K. Pickett, Benjamin Thomas Sander, Kevin Michael Lepak
-
Patent number: 7024537Abstract: A system may include a memory file and an execution core. The memory file may include an entry configured to store an addressing pattern and a tag. If an addressing pattern of a memory operation matches the addressing pattern stored in the entry, the memory file may be configured to link a data value identified by the tag to a speculative result of the memory operation. The addressing pattern of the memory operation includes an identifier of a logical register, and the memory file may be configured to predict whether the logical register is being specified as a general purpose register or a stack frame pointer register in order to determine whether the addressing pattern of the memory operation matches the addressing pattern stored in the entry. The execution core may be configured to access the speculative result when executing another operation that is dependent on the memory operation.Type: GrantFiled: January 21, 2003Date of Patent: April 4, 2006Assignee: Advanced Micro Devices, Inc.Inventors: James K. Pickett, Benjamin Thomas Sander, Kevin Michael Lepak
-
Patent number: 6981119Abstract: A memory system may use the storage space freed by compressing a unit of data to store performance-enhancing data associated with that unit of data. For example, a memory controller may be configured to allocate several of storage locations within a memory to store a unit of data. If the unit of data is compressed, the unit of data may not occupy a portion of the storage locations allocated to it. The memory controller may store performance-enhancing data associated with the unit of data in the portion of the storage locations allocated to but not occupied by the first unit of data.Type: GrantFiled: August 29, 2002Date of Patent: December 27, 2005Assignee: Advanced Micro Devices, Inc.Inventors: Kevin Michael Lepak, Benjamin Thomas Sander