Patents by Inventor Sooraj Puthoor

Sooraj Puthoor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10019377
    Abstract: The described embodiments include a computing device with two or more types of processors and a memory that is shared between the two or more types of processors. The computing device performs operations for handling cache coherency between the two or more types of processors. During operation, the computing device sets a cache coherency indicator in metadata in a page table entry in a page table, the page table entry information about a page of data that is stored in the memory. The computing device then uses the cache coherency indicator to determine operations to be performed when accessing data in the page of data in the memory. For example, the computing device can use the coherency indicator to determine whether a coherency operation is to be performed when a processor of a given type accesses data in the page of data in the memory.
    Type: Grant
    Filed: May 23, 2016
    Date of Patent: July 10, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Arkaprava Basu, Bradford M. Beckmann, Shuai Che, Sooraj Puthoor
  • Patent number: 10019283
    Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.
    Type: Grant
    Filed: June 22, 2015
    Date of Patent: July 10, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
  • Publication number: 20180088858
    Abstract: A processing apparatus is provided that includes NVRAM and one or more processors configured to process a first set and a second set of instructions according to a hierarchical processing scope and process a scoped persistence barrier residing in the program after the first instruction set and before the second instruction set. The barrier includes an instruction to cause first data to persist in the NVRAM before second data persists in the NVRAM. The first data results from execution of each of the first set of instructions processed according to the one hierarchical processing scope. The second data results from execution of each of the second set of instructions processed according to the one hierarchical processing scope. The processing apparatus also includes a controller configured to cause the first data to persist in the NVRAM before the second data persists in the NVRAM based on the scoped persistence barrier.
    Type: Application
    Filed: September 23, 2016
    Publication date: March 29, 2018
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Arkaprava Basu, Mitesh R. Meswani, Dibakar Gope, Sooraj Puthoor
  • Patent number: 9898287
    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.
    Type: Grant
    Filed: April 9, 2015
    Date of Patent: February 20, 2018
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Bradford M. Beckmann, Dmitri Yudanov
  • Publication number: 20170337136
    Abstract: The described embodiments include a computing device with two or more types of processors and a memory that is shared between the two or more types of processors. The computing device performs operations for handling cache coherency between the two or more types of processors. During operation, the computing device sets a cache coherency indicator in metadata in a page table entry in a page table, the page table entry information about a page of data that is stored in the memory. The computing device then uses the cache coherency indicator to determine operations to be performed when accessing data in the page of data in the memory. For example, the computing device can use the coherency indicator to determine whether a coherency operation is to be performed when a processor of a given type accesses data in the page of data in the memory.
    Type: Application
    Filed: May 23, 2016
    Publication date: November 23, 2017
    Inventors: Arkaprava Basu, Bradford M. Beckmann, Shuai Che, Sooraj Puthoor
  • Publication number: 20170220346
    Abstract: Briefly, methods and apparatus to migrate a software thread from one wavefront executing on one execution unit to another wavefront executing on another execution unit whereby both execution units are associated with a compute unit of a processing device such as, for example, a GPU. The methods and apparatus may execute compiled dynamic thread migration swizzle buffer instructions that when executed allow access to a dynamic thread migration swizzle buffer that allows for the migration of register context information when migrating software threads. The register context information may be located in one or more locations of a register file prior to storing the register context information into the dynamic thread migration swizzle buffer. The method and apparatus may also return the register context information from the dynamic thread migration swizzle buffer to one or more different register file locations of the register file.
    Type: Application
    Filed: January 29, 2016
    Publication date: August 3, 2017
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Bradford Beckmann, Sooraj Puthoor
  • Publication number: 20170004080
    Abstract: Methods, devices, and systems for managing performance of a processor having multiple compute units. An effective number of the multiple compute units may be determined to designate as having priority. On a condition that the effective number is nonzero, the effective number of the multiple compute units may each be designated as a priority compute unit. Priority compute units may have access to a shared cache whereas non-priority compute units may not. Workgroups may be preferentially dispatched to priority compute units. Memory access requests from priority compute units may be served ahead of requests from non-priority compute units.
    Type: Application
    Filed: June 30, 2015
    Publication date: January 5, 2017
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Zhe Wang, Sooraj Puthoor, Bradford M. Beckmann
  • Publication number: 20160371082
    Abstract: A processing device includes a first memory that includes a context buffer. The processing device also includes a processor core to execute threads based on context information stored in registers of the processor core and a memory controller to selectively move a subset of the context information between the context buffer and the registers based on one or more latencies of the threads.
    Type: Application
    Filed: June 22, 2015
    Publication date: December 22, 2016
    Inventors: Dmitri Yudanov, Sergey Blagodurov, Arkaprava Basu, Sooraj Puthoor, Joseph L. Greathouse
  • Publication number: 20160239302
    Abstract: A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.
    Type: Application
    Filed: April 9, 2015
    Publication date: August 18, 2016
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sooraj Puthoor, Bradford M. Beckmann, Dmitri Yudanov
  • Publication number: 20160085551
    Abstract: A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.
    Type: Application
    Filed: September 18, 2014
    Publication date: March 24, 2016
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Joseph L. Greathouse, Mitesh R. Meswani, Sooraj Puthoor, Dmitri Yudanov, James M. O'Connor