Patents by Inventor Norman Rubin

Norman Rubin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230251861
    Abstract: Systems and methods for obtaining a set of instructions for executing a computer program and generating executable code for the computer program based, at least in part, on scheduling operations associated with the executable code according to a polyhedral representation of a directed acyclic graph. The set of instructions may be represented as a domain-specific language. The executable code may be executable code for a specific processor architecture.
    Type: Application
    Filed: April 18, 2023
    Publication date: August 10, 2023
    Inventors: Venmugil Elango, Norman Rubin, Mahesh Ravishankar, Vinod Grover
  • Publication number: 20190278593
    Abstract: Systems and methods for obtaining a set of instructions for executing a computer program and generating executable code for the computer program based, at least in part, on scheduling operations associated with the executable code according to a polyhedral representation of a directed acyclic graph. The set of instructions may be represented as a domain-specific language. The executable code may be executable code for a specific processor architecture.
    Type: Application
    Filed: February 15, 2019
    Publication date: September 12, 2019
    Inventors: Venmugil Elango, Norman Rubin, Mahesh Ravishankar, Vinod K. Grover
  • Patent number: 9170820
    Abstract: Provided is a method for processing system calls from a GPU to a CPU. The method includes a GPU storing a plurality of tasks in a memory, with each task representing a function to be performed on the CPU. The method also includes generating a CPU interrupt, and processing of the stored plurality of tasks by the CPU.
    Type: Grant
    Filed: December 15, 2011
    Date of Patent: October 27, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Norman Rubin, Michael Mantor
  • Patent number: 8959319
    Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.
    Type: Grant
    Filed: December 2, 2011
    Date of Patent: February 17, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
  • Patent number: 8935475
    Abstract: Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: January 13, 2015
    Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel, Norman Rubin, Mark Fowler
  • Patent number: 8607247
    Abstract: Method, system, and computer program product embodiments for synchronizing workitems on one or more processors are disclosed. The embodiments include executing a barrier skip instruction by a first workitem from the group, and responsive to the executed barrier skip instruction, reconfiguring a barrier to synchronize other workitems from the group in a plurality of points in a sequence without requiring the first workitem to reach the barrier in any of the plurality of points.
    Type: Grant
    Filed: November 3, 2011
    Date of Patent: December 10, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston, Michael Mantor, Mark Leather, Norman Rubin, Brian D. Emberling
  • Publication number: 20130262775
    Abstract: Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 3, 2013
    Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.
    Inventors: Anthony ASARO, Kevin Normoyle, Mark Hummel, Norman Rubin, Mark Fowler
  • Publication number: 20130159685
    Abstract: A function in source code is processed by a compiler for execution on a graphics processing unit, wherein the function includes an exception handling structure. An exception raising block is converted into a first control flow and an exception handler block is converted into a second control flow. The first control flow includes setting an exception raised indicator and finding an exception handler to process the raised exception. The exception raised indicator remains set until an appropriate exception handler is found. The second control flow includes clearing the exception raised indicator and processing the exception.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Dz-ching Ju, Norman Rubin, Gang Chen
  • Publication number: 20130155074
    Abstract: Provided is a method for processing system calls from a GPU to a CPU. The method includes a GPU storing a plurality of tasks in a memory, with each task representing a function to be performed on the CPU. The method also includes generating a CPU interrupt, and processing of the stored plurality of tasks by the CPU.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Norman RUBIN, Michael Mantor
  • Publication number: 20130117750
    Abstract: Method, system, and computer program product embodiments for synchronizing workitems on one or more processors are disclosed. The embodiments include executing a barrier skip instruction by a first workitem from the group, and responsive to the executed barrier skip instruction, reconfiguring a barrier to synchronize other workitems from the group in a plurality of points in a sequence without requiring the first workitem to reach the barrier in any of the plurality of points.
    Type: Application
    Filed: November 3, 2011
    Publication date: May 9, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Lee W. HOWES, Benedict R. Gaster, Michael C. Houston, Michael Mantor, Mark Leather, Norman Rubin, Brian D. Emberling
  • Publication number: 20120204014
    Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.
    Type: Application
    Filed: December 2, 2011
    Publication date: August 9, 2012
    Inventors: Mark LEATHER, Norman Rubin, Brian D. Emberling, Michael Mantor
  • Patent number: 7774765
    Abstract: A method and apparatus for use in compiling data for a program shader identifies within data representing control flow information an area operator definition instruction statement located outside the data dependent control flow structures. The method identifies within one of the data dependent branches at least one area operator use instruction statement that has the resultant of the area operator definition instruction statement as an operand. After identifying the area operator use instruction statement, the area operator definition instruction statement is placed within the data dependent branch.
    Type: Grant
    Filed: February 7, 2006
    Date of Patent: August 10, 2010
    Assignee: ATI Technologies Inc.
    Inventors: Norman Rubin, William L. Licea-Kane
  • Patent number: 7568191
    Abstract: A method and apparatus for superword register value numbering includes hashing an operation code and the value numbers of a plurality of sources to generate a flint hash value. The method and apparatus further includes retrieving an operation value number from the first hash table based on the first hash value. The method and apparatus further includes generating a result value number based on a previous bit hash value and the operation value number. The result value number is a combination of the operation value numbers for each component having a live indicator (e.g., a false write mask value) and a previous value numbers for the components without the live indicator (e.g., a true write mask value). Thereupon, the method and apparatus includes searching a second hash table using the result value number. As such, the method and apparatus provides using two separate hash tables for value numbering with superword instructions.
    Type: Grant
    Filed: January 30, 2004
    Date of Patent: July 28, 2009
    Assignee: ATI Technologies, Inc.
    Inventors: Norman Rubin, Richard Bagley
  • Patent number: 7568193
    Abstract: A method and apparatus for SSA dead code elimination includes examining a first instruction off a worklist, wherein the first instruction includes previous link and a write mask and the first instruction is an SSA instruction. The method and apparatus further includes examining at least one second instruction of the machine code, wherein the at least one second instructions are sources of the first instruction and the at least one second instructions are SSA instruction. In the method and apparatus, each of the at least one second instructions include a previous link and a write mask. The method and apparatus further includes determining if any components within a particular field of the at least one second instruction are live. If none of the components are live, the method and apparatus provides for deleting the second instruction from the machine code as it is determined that this instruction is extraneous, dead code.
    Type: Grant
    Filed: January 28, 2004
    Date of Patent: July 28, 2009
    Assignee: ATI Technologies, Inc.
    Inventors: Norman Rubin, Myron King
  • Patent number: 7281122
    Abstract: A method and apparatus for nested control flow includes a processor having at least one context bit. The processor includes a plurality of arithmetic logic units for performing single instruction multiple data (SIMD) operations. The method and apparatus further includes a first memory device storing a plurality of instructions wherein each of the plurality of instructions includes a plurality of extra bits. The processor is operative to execute the instructions based on the extra bits and in conjunction with a context bit. The method and apparatus further includes a second memory device, such as a general purpose register operably coupled to the processor, the second memory device receiving an incrementing counter instruction upon the execution of one of the plurality of instructions. As such, the method and apparatus allows for nested control flow through a single context bit in conjunction with instructions having a plurality of extra bits.
    Type: Grant
    Filed: January 14, 2004
    Date of Patent: October 9, 2007
    Assignee: ATI Technologies Inc.
    Inventors: Norman Rubin, Andrew Gruber
  • Publication number: 20070180437
    Abstract: A method and apparatus for use in compiling data for a program shader identifies within data representing control flow information an area operator definition instruction statement located outside the data dependent control flow structures. The method identifies within one of the data dependent branches at least one area operator use instruction statement that has the resultant of the area operator definition instruction statement as an operand. After identifying the area operator use instruction statement, the area operator definition instruction statement is placed within the data dependent branch.
    Type: Application
    Filed: February 7, 2006
    Publication date: August 2, 2007
    Applicant: ATI Technologies Inc.
    Inventors: Norman Rubin, William Licea-Kane
  • Patent number: 6968542
    Abstract: A method of identifying pseudo-invariant instructions in computer program hot paths, comprising the steps of creating an intermediate representation of a hot path in a software buffer, executing instructions in the program image for the computer program until a hot path is detected, copying computer machine state and computer processor register contents to a context in memory, and using this context to compute an output a plurality of times for each instruction in the hot path using an interpreter that emulates the computer processor. Results of the interpreter computations are stored with the frequency count for each unique output in a table that is readable by a program optimizer. Frequency counts for each instruction are compared with a pseudo-invariant threshold to classify an instruction as pseudo-invariant.
    Type: Grant
    Filed: February 23, 2001
    Date of Patent: November 22, 2005
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Richard J. Bagley, Dean M. Deaver, Chris L. Reeve, Norman Rubin
  • Publication number: 20050198468
    Abstract: A method and apparatus for superword register value numbering includes hashing an operation code and the value numbers of a plurality of sources to generate a first hash value. The method and apparatus further includes retrieving an operation value number from the first hash table based on the first hash value. The method and apparatus further includes generating a result value number based on a previous bit hash value and the operation value number. The result value number is a combination of the operation value numbers for each component having a live indicator and a previous value numbers for the components without the live indicator. Thereupon, the method and apparatus includes searching a second hash table using the result value number. As such, the method and apparatus provides using two separate hash tables for value numbering with superword instructions.
    Type: Application
    Filed: January 30, 2004
    Publication date: September 8, 2005
    Applicant: ATI Technologies, Inc.
    Inventors: Norman Rubin, Richard Bagley
  • Publication number: 20050166194
    Abstract: A method and apparatus for SSA dead code elimination includes examining a first instruction off a worklist, wherein the first instruction includes previous link and a write mask and the first instruction is an SSA instruction. The method and apparatus further includes examining at least one second instruction of the machine code, wherein the at least one second instructions are sources of the first instruction and the at least one second instructions are SSA instruction. In the method and apparatus, each of the at least one second instructions include a previous link and a write mask. The method and apparatus further includes determining if any elements within a particular field are live for the at least one second instruction. If no the elements are live, the method and apparatus provides for deleting the first instruction from the machine code as it is determined that this instruction is extraneous, dead code.
    Type: Application
    Filed: January 28, 2004
    Publication date: July 28, 2005
    Applicant: ATI Technologies, Inc.
    Inventors: Norman Rubin, Myron King
  • Publication number: 20050154864
    Abstract: A method and apparatus for nested control flow includes a processor having at least one context bit. The processor includes a plurality of arithmetic logic units for performing single instruction multiple data (SIMD) operations. The method and apparatus further includes a first memory device storing a plurality of instructions wherein each of the plurality of instructions includes a plurality of extra bits. The processor is operative to execute the instructions based on the extra bits and in conjunction with a context bit. The method and apparatus further includes a second memory device, such as a general purpose register operably coupled to the processor, the second memory device receiving an incrementing counter instruction upon the execution of one of the plurality of instructions. As such, the method and apparatus allows for nested control flow through a single context bit in conjunction with instructions having a plurality of extra bits.
    Type: Application
    Filed: January 14, 2004
    Publication date: July 14, 2005
    Applicant: ATI Technologies, Inc.
    Inventors: Norman Rubin, Andrew Gruber