Patents by Inventor Benjamin T. Sander

Benjamin T. Sander has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220269620
    Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.
    Type: Application
    Filed: February 8, 2022
    Publication date: August 25, 2022
    Inventors: Benjamin T. SANDER, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Michael Mantor
  • Patent number: 11288205
    Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.
    Type: Grant
    Filed: June 23, 2015
    Date of Patent: March 29, 2022
    Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULC
    Inventors: Benjamin T. Sander, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Mike Mantor
  • Patent number: 11100004
    Abstract: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.
    Type: Grant
    Filed: June 23, 2015
    Date of Patent: August 24, 2021
    Assignees: ADVANCED MICRO DEVICES, INC., ATI TECHNOLOGIES ULC
    Inventors: Gongxian Jeffrey Cheng, Mark Fowler, Philip J. Rogers, Benjamin T. Sander, Anthony Asaro, Mike Mantor, Raja Koduri
  • Patent number: 10467138
    Abstract: A processing system includes a first socket, a second socket, and an interface between the first socket and the second socket. A first memory is associated with the first socket and a second memory is associated with the second socket. The processing system also includes a controller for the first memory. The controller is to receive a first request for a first memory transaction with the second memory and perform the first memory transaction along a path that includes the interface and bypasses at least one second cache associated with the second memory.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: November 5, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Paul Blinzer, Ali Ibrahim, Benjamin T. Sander, Vydhyanathan Kalyanasundharam
  • Patent number: 10423354
    Abstract: A memory manager of a processor identifies a block of data for eviction from a first memory module to a second memory module. In response, the processor copies only those portions of the data block that have been identified as modified portions to the second memory module. The amount of data to be copied is thereby reduced, improving memory management efficiency and reducing processor power consumption.
    Type: Grant
    Filed: September 23, 2015
    Date of Patent: September 24, 2019
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Philip Rogers, Benjamin T. Sander, Anthony Asaro, Gongxian Jeffrey Cheng
  • Patent number: 9910788
    Abstract: A processor device includes a cache and a memory storing a set of counters. Each counter of the set is associated with a corresponding block of a plurality of blocks of the cache. The processor device further includes a cache access monitor to, for each time quantum for a series of one or more time quanta, increment counter values of the set of counters based on accesses to the corresponding blocks of the cache. The processor device further includes a transfer engine to, after completion of each time quantum, transfer the counter values of the set of counters for the time quantum to a corresponding location in a system memory.
    Type: Grant
    Filed: September 22, 2015
    Date of Patent: March 6, 2018
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Philip J. Rogers, Benjamin T. Sander, Anthony Asaro
  • Publication number: 20170185514
    Abstract: A processing system includes a first socket, a second socket, and an interface between the first socket and the second socket. A first memory is associated with the first socket and a second memory is associated with the second socket. The processing system also includes a controller for the first memory. The controller is to receive a first request for a first memory transaction with the second memory and perform the first memory transaction along a path that includes the interface and bypasses at least one second cache associated with the second memory.
    Type: Application
    Filed: December 28, 2015
    Publication date: June 29, 2017
    Inventors: Paul Blinzer, Ali Ibrahim, Benjamin T. Sander, Vydhyanathan Kalyanasundharam
  • Publication number: 20170083240
    Abstract: A memory manager of a processor identifies a block of data for eviction from a first memory module to a second memory module. In response, the processor copies only those portions of the data block that have been identified as modified portions to the second memory module. The amount of data to be copied is thereby reduced, improving memory management efficiency and reducing processor power consumption.
    Type: Application
    Filed: September 23, 2015
    Publication date: March 23, 2017
    Inventors: Philip Rogers, Benjamin T. Sander, Anthony Asaro, Gongxian Jeffrey Cheng
  • Publication number: 20170083455
    Abstract: A processor device includes a cache and a memory storing a set of counters. Each counter of the set is associated with a corresponding block of a plurality of blocks of the cache. The processor device further includes a cache access monitor to, for each time quantum for a series of one or more time quanta, increment counter values of the set of counters based on accesses to the corresponding blocks of the cache. The processor device further includes a transfer engine to, after completion of each time quantum, transfer the counter values of the set of counters for the time quantum to a corresponding location in a system memory.
    Type: Application
    Filed: September 22, 2015
    Publication date: March 23, 2017
    Inventors: Philip J. Rogers, Benjamin T. Sander, Anthony Asaro
  • Publication number: 20160378682
    Abstract: A processor maintains an access log indicating a stream of cache misses at a cache of the processor. In response to each of at least a subset of cache misses at the cache, the processor records a corresponding entry in the access log, indicating a physical memory address of the memory access request that resulted in the corresponding miss. In addition, the processor maintains an address translation log that indicates a mapping of physical memory addresses to virtual memory addresses. In response to an address translation (e.g., a page walk) that translates a virtual address to a physical address, the processor stores a mapping of the physical address to the corresponding virtual address at an entry of the address translation log. Software executing at the processor can use the two logs for memory management.
    Type: Application
    Filed: June 23, 2015
    Publication date: December 29, 2016
    Inventors: Benjamin T. Sander, Mark Fowler, Anthony Asaro, Gongxian Jeffrey Cheng, Mike Mantor
  • Publication number: 20160378674
    Abstract: A processor uses the same virtual address space for heterogeneous processing units of the processor. The processor employs different sets of page tables for different types of processing units, such as a CPU and a GPU, wherein a memory management unit uses each set of page tables to translate virtual addresses of the virtual address space to corresponding physical addresses of memory modules associated with the processor. As data is migrated between memory modules, the physical addresses in the page tables can be updated to reflect the physical location of the data for each processing unit.
    Type: Application
    Filed: June 23, 2015
    Publication date: December 29, 2016
    Inventors: Gongxian Jeffrey Cheng, Mark Fowler, Philip J. Rogers, Benjamin T. Sander, Anthony Asaro, Mike Mantor, Raja Koduri
  • Patent number: 9501269
    Abstract: A programming model for a processor accelerator allows accelerated functions to be called from a main program directly without a management API for the accelerator. A compiler automatically generates wrapper source code for each accelerator function called by the application source code. The wrapper code is compiled, together with the accelerator source code, to generate an object file that is linked to an object file for the main program. By automatically generating the wrapper code, a programmer can simply and directly invoke accelerator functions without the use of a complex management API. In addition, because the wrapper code for the accelerator is generated automatically, a standard compiler can be used to compile the main program, using standard linkage conventions.
    Type: Grant
    Filed: September 30, 2014
    Date of Patent: November 22, 2016
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory P. Rodgers, Benjamin T. Sander, Shreyas Ramalingam
  • Publication number: 20160092181
    Abstract: A programming model for a processor accelerator allows accelerated functions to be called from a main program directly without a management API for the accelerator. A compiler automatically generates wrapper source code for each accelerator function called by the application source code. The wrapper code is compiled, together with the accelerator source code, to generate an object file that is linked to an object file for the main program. By automatically generating the wrapper code, a programmer can simply and directly invoke accelerator functions without the use of a complex management API. In addition, because the wrapper code for the accelerator is generated automatically, a standard compiler can be used to compile the main program, using standard linkage conventions.
    Type: Application
    Filed: September 30, 2014
    Publication date: March 31, 2016
    Inventors: Gregory P. Rodgers, Benjamin T. Sander, Shreyas Ramalingam
  • Patent number: 8667225
    Abstract: A system and method for efficient data prefetching. A data stream stored in lower-level memory comprises a contiguous block of data used in a computer program. A prefetch unit in a processor detects a data stream by identifying a sequence of storage accesses referencing a contiguous blocks of data in a monotonically increasing or decreasing manner. After a predetermined training period for a given data stream, the prefetch unit prefetches a portion of the given data stream from memory without write permission, in response to an access that does not request write permission. Also, after the training period, the prefetch unit prefetches a portion of the given data stream from lower-level memory with write permission, in response to determining there has been a prior access to the given data stream that requests write permission subsequent to a number of cache misses reaching a predetermined threshold.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: March 4, 2014
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin T. Sander, Bharath Narasimha Swamy, Swamy Punyamurtula
  • Publication number: 20130339978
    Abstract: A method and an apparatus for performing load balancing in a heterogeneous computing system including a plurality of processing elements are presented. A program places tasks into a queue. A task from the queue is distributed to one of the plurality of processing elements, wherein the distributing includes the one processing element sending a task request to the queue and receiving a task to be done from the queue. The task is performed by the one processing element. A result of the task is sent from the one processing element to the program. The load balancing is performed by distributing tasks from the queue to processing elements that complete the tasks faster.
    Type: Application
    Filed: June 13, 2013
    Publication date: December 19, 2013
    Inventor: Benjamin T. Sander
  • Patent number: 8195887
    Abstract: A data processing device is disclosed that includes multiple processing cores, where each core is associated with a corresponding cache. When a processing core is placed into a first sleep mode, the data processing device initiates a first phase. If any cache probes are received at the processing core during the first phase, the cache probes are serviced. At the end of the first phase, the cache corresponding to the processing core is flushed, and subsequent cache probes are not serviced at the cache. Because it does not service the subsequent cache probes, the processing core can therefore enter another sleep mode, allowing the data processing device to conserve additional power.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: June 5, 2012
    Inventors: William A. Hughes, Kiran K. Bondalapati, Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Benjamin T. Sander
  • Publication number: 20120023314
    Abstract: A method and mechanism for reducing latency of a multi-cycle scheduler within a processor. A processor comprises a front end pipeline that determines data dependencies between instructions prior to a scheduling pipe stage. For each data dependency, a distance value is determined based on a number of instructions a younger dependent instruction is located from a corresponding older (in program order) instruction. When the younger dependent instruction is allocated an entry in a multi-cycle scheduler, this distance value may be used to locate an entry storing the older instruction in the scheduler. When the older instruction is picked for issue, the younger dependent instruction is marked as pre-picked. In an immediately subsequent clock cycle, the younger dependent instruction may be picked for issue, thereby reducing the latency of the multi-cycle scheduler.
    Type: Application
    Filed: July 21, 2010
    Publication date: January 26, 2012
    Inventors: Matthew M. Crum, Michael D. Achenbach, Betty A. McDaniel, Benjamin T. Sander
  • Patent number: 7937569
    Abstract: A system and method for scheduling operations using speculative data operands. In one embodiment, a system may include a scheduler configured to store a speculative source tag and a non-speculative source tag for an operand of an operation and an execution core configured to execute operations issued by the scheduler and to output result tags identifying operands generated by executing the operations. The scheduler may be configured to determine whether the operation is ready to issue by comparing the speculative source tag, but not the non-speculative source tag, to the result tags output by the execution core unless an incorrect speculation has been detected. If an incorrect speculation has been detected, the scheduler may be configured to determine whether the operation is ready to issue by comparing the non-speculative source tag, but not the speculative source tag, to the result tags output by the execution core.
    Type: Grant
    Filed: May 5, 2004
    Date of Patent: May 3, 2011
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Benjamin T. Sander, Brian D. McMinn
  • Publication number: 20110066811
    Abstract: A system and method for efficient data prefetching. A data stream stored in lower-level memory comprises a contiguous block of data used in a computer program. A prefetch unit in a processor detects a data stream by identifying a sequence of storage accesses referencing a contiguous blocks of data in a monotonically increasing or decreasing manner. After a predetermined training period for a given data stream, the prefetch unit prefetches a portion of the given data stream from memory without write permission, in response to an access that does not request write permission. Also, after the training period, the prefetch unit prefetches a portion of the given data stream from lower-level memory with write permission, in response to determining there has been a prior access to the given data stream that requests write permission subsequent to a number of cache misses reaching a predetermined threshold.
    Type: Application
    Filed: September 11, 2009
    Publication date: March 17, 2011
    Inventors: Benjamin T. Sander, Bharath Narasimha Swamy, Swamy Punyamurtula
  • Publication number: 20100185820
    Abstract: A data processing device is disclosed that includes multiple processing cores, where each core is associated with a corresponding cache. When a processing core is placed into a first sleep mode, the data processing device initiates a first phase. If any cache probes are received at the processing core during the first phase, the cache probes are serviced. At the end of the first phase, the cache corresponding to the processing core is flushed, and subsequent cache probes are not serviced at the cache. Because it does not service the subsequent cache probes, the processing core can therefore enter another sleep mode, allowing the data processing device to conserve additional power.
    Type: Application
    Filed: January 21, 2009
    Publication date: July 22, 2010
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: William A. Hughes, Kiran K. Bondalapati, Vydhyanathan Kalyanasundharam, Kevin M. Lepak, Benjamin T. Sander