Patents by Inventor Norman Rubin
Norman Rubin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230251861Abstract: Systems and methods for obtaining a set of instructions for executing a computer program and generating executable code for the computer program based, at least in part, on scheduling operations associated with the executable code according to a polyhedral representation of a directed acyclic graph. The set of instructions may be represented as a domain-specific language. The executable code may be executable code for a specific processor architecture.Type: ApplicationFiled: April 18, 2023Publication date: August 10, 2023Inventors: Venmugil Elango, Norman Rubin, Mahesh Ravishankar, Vinod Grover
-
Publication number: 20190278593Abstract: Systems and methods for obtaining a set of instructions for executing a computer program and generating executable code for the computer program based, at least in part, on scheduling operations associated with the executable code according to a polyhedral representation of a directed acyclic graph. The set of instructions may be represented as a domain-specific language. The executable code may be executable code for a specific processor architecture.Type: ApplicationFiled: February 15, 2019Publication date: September 12, 2019Inventors: Venmugil Elango, Norman Rubin, Mahesh Ravishankar, Vinod K. Grover
-
Patent number: 9170820Abstract: Provided is a method for processing system calls from a GPU to a CPU. The method includes a GPU storing a plurality of tasks in a memory, with each task representing a function to be performed on the CPU. The method also includes generating a CPU interrupt, and processing of the stored plurality of tasks by the CPU.Type: GrantFiled: December 15, 2011Date of Patent: October 27, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Norman Rubin, Michael Mantor
-
Patent number: 8959319Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.Type: GrantFiled: December 2, 2011Date of Patent: February 17, 2015Assignee: Advanced Micro Devices, Inc.Inventors: Mark Leather, Norman Rubin, Brian D. Emberling, Michael Mantor
-
Patent number: 8935475Abstract: Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.Type: GrantFiled: March 30, 2012Date of Patent: January 13, 2015Assignees: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel, Norman Rubin, Mark Fowler
-
Patent number: 8607247Abstract: Method, system, and computer program product embodiments for synchronizing workitems on one or more processors are disclosed. The embodiments include executing a barrier skip instruction by a first workitem from the group, and responsive to the executed barrier skip instruction, reconfiguring a barrier to synchronize other workitems from the group in a plurality of points in a sequence without requiring the first workitem to reach the barrier in any of the plurality of points.Type: GrantFiled: November 3, 2011Date of Patent: December 10, 2013Assignee: Advanced Micro Devices, Inc.Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston, Michael Mantor, Mark Leather, Norman Rubin, Brian D. Emberling
-
Publication number: 20130262775Abstract: Embodiments of the present invention provides for the execution of threads and/or workitems on multiple processors of a heterogeneous computing system in a manner that they can share data correctly and efficiently. Disclosed method, system, and article of manufacture embodiments include, responsive to an instruction from a sequence of instructions of a work-item, determining an ordering of visibility to other work-items of one or more other data items in relation to a particular data item, and performing at least one cache operation upon at least one of the particular data item or the other data items present in any one or more cache memories in accordance with the determined ordering. The semantics of the instruction includes a memory operation upon the particular data item.Type: ApplicationFiled: March 30, 2012Publication date: October 3, 2013Applicants: ATI Technologies ULC, Advanced Micro Devices, Inc.Inventors: Anthony ASARO, Kevin Normoyle, Mark Hummel, Norman Rubin, Mark Fowler
-
Publication number: 20130159685Abstract: A function in source code is processed by a compiler for execution on a graphics processing unit, wherein the function includes an exception handling structure. An exception raising block is converted into a first control flow and an exception handler block is converted into a second control flow. The first control flow includes setting an exception raised indicator and finding an exception handler to process the raised exception. The exception raised indicator remains set until an appropriate exception handler is found. The second control flow includes clearing the exception raised indicator and processing the exception.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Dz-ching Ju, Norman Rubin, Gang Chen
-
Publication number: 20130155074Abstract: Provided is a method for processing system calls from a GPU to a CPU. The method includes a GPU storing a plurality of tasks in a memory, with each task representing a function to be performed on the CPU. The method also includes generating a CPU interrupt, and processing of the stored plurality of tasks by the CPU.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: Advanced Micro Devices, Inc.Inventors: Norman RUBIN, Michael Mantor
-
Publication number: 20130117750Abstract: Method, system, and computer program product embodiments for synchronizing workitems on one or more processors are disclosed. The embodiments include executing a barrier skip instruction by a first workitem from the group, and responsive to the executed barrier skip instruction, reconfiguring a barrier to synchronize other workitems from the group in a plurality of points in a sequence without requiring the first workitem to reach the barrier in any of the plurality of points.Type: ApplicationFiled: November 3, 2011Publication date: May 9, 2013Applicant: Advanced Micro Devices, Inc.Inventors: Lee W. HOWES, Benedict R. Gaster, Michael C. Houston, Michael Mantor, Mark Leather, Norman Rubin, Brian D. Emberling
-
Publication number: 20120204014Abstract: Embodiments of the present invention provide systems, methods, and computer program products for improving divergent conditional branches in code being executed by a processor. For example, in an embodiment, a method comprises detecting a conditional statement of a program being simultaneously executed by a plurality of threads, determining which threads evaluate a condition of the conditional statement as true and which threads evaluate the condition as false, pushing an identifier associated with the larger set of the threads onto a stack, executing code associated with a smaller set of the threads, and executing code associated with the larger set of the threads.Type: ApplicationFiled: December 2, 2011Publication date: August 9, 2012Inventors: Mark LEATHER, Norman Rubin, Brian D. Emberling, Michael Mantor
-
Patent number: 7774765Abstract: A method and apparatus for use in compiling data for a program shader identifies within data representing control flow information an area operator definition instruction statement located outside the data dependent control flow structures. The method identifies within one of the data dependent branches at least one area operator use instruction statement that has the resultant of the area operator definition instruction statement as an operand. After identifying the area operator use instruction statement, the area operator definition instruction statement is placed within the data dependent branch.Type: GrantFiled: February 7, 2006Date of Patent: August 10, 2010Assignee: ATI Technologies Inc.Inventors: Norman Rubin, William L. Licea-Kane
-
Patent number: 7568191Abstract: A method and apparatus for superword register value numbering includes hashing an operation code and the value numbers of a plurality of sources to generate a flint hash value. The method and apparatus further includes retrieving an operation value number from the first hash table based on the first hash value. The method and apparatus further includes generating a result value number based on a previous bit hash value and the operation value number. The result value number is a combination of the operation value numbers for each component having a live indicator (e.g., a false write mask value) and a previous value numbers for the components without the live indicator (e.g., a true write mask value). Thereupon, the method and apparatus includes searching a second hash table using the result value number. As such, the method and apparatus provides using two separate hash tables for value numbering with superword instructions.Type: GrantFiled: January 30, 2004Date of Patent: July 28, 2009Assignee: ATI Technologies, Inc.Inventors: Norman Rubin, Richard Bagley
-
Patent number: 7568193Abstract: A method and apparatus for SSA dead code elimination includes examining a first instruction off a worklist, wherein the first instruction includes previous link and a write mask and the first instruction is an SSA instruction. The method and apparatus further includes examining at least one second instruction of the machine code, wherein the at least one second instructions are sources of the first instruction and the at least one second instructions are SSA instruction. In the method and apparatus, each of the at least one second instructions include a previous link and a write mask. The method and apparatus further includes determining if any components within a particular field of the at least one second instruction are live. If none of the components are live, the method and apparatus provides for deleting the second instruction from the machine code as it is determined that this instruction is extraneous, dead code.Type: GrantFiled: January 28, 2004Date of Patent: July 28, 2009Assignee: ATI Technologies, Inc.Inventors: Norman Rubin, Myron King
-
Patent number: 7281122Abstract: A method and apparatus for nested control flow includes a processor having at least one context bit. The processor includes a plurality of arithmetic logic units for performing single instruction multiple data (SIMD) operations. The method and apparatus further includes a first memory device storing a plurality of instructions wherein each of the plurality of instructions includes a plurality of extra bits. The processor is operative to execute the instructions based on the extra bits and in conjunction with a context bit. The method and apparatus further includes a second memory device, such as a general purpose register operably coupled to the processor, the second memory device receiving an incrementing counter instruction upon the execution of one of the plurality of instructions. As such, the method and apparatus allows for nested control flow through a single context bit in conjunction with instructions having a plurality of extra bits.Type: GrantFiled: January 14, 2004Date of Patent: October 9, 2007Assignee: ATI Technologies Inc.Inventors: Norman Rubin, Andrew Gruber
-
Publication number: 20070180437Abstract: A method and apparatus for use in compiling data for a program shader identifies within data representing control flow information an area operator definition instruction statement located outside the data dependent control flow structures. The method identifies within one of the data dependent branches at least one area operator use instruction statement that has the resultant of the area operator definition instruction statement as an operand. After identifying the area operator use instruction statement, the area operator definition instruction statement is placed within the data dependent branch.Type: ApplicationFiled: February 7, 2006Publication date: August 2, 2007Applicant: ATI Technologies Inc.Inventors: Norman Rubin, William Licea-Kane
-
Patent number: 6968542Abstract: A method of identifying pseudo-invariant instructions in computer program hot paths, comprising the steps of creating an intermediate representation of a hot path in a software buffer, executing instructions in the program image for the computer program until a hot path is detected, copying computer machine state and computer processor register contents to a context in memory, and using this context to compute an output a plurality of times for each instruction in the hot path using an interpreter that emulates the computer processor. Results of the interpreter computations are stored with the frequency count for each unique output in a table that is readable by a program optimizer. Frequency counts for each instruction are compared with a pseudo-invariant threshold to classify an instruction as pseudo-invariant.Type: GrantFiled: February 23, 2001Date of Patent: November 22, 2005Assignee: Hewlett-Packard Development Company, L.P.Inventors: Richard J. Bagley, Dean M. Deaver, Chris L. Reeve, Norman Rubin
-
Publication number: 20050198468Abstract: A method and apparatus for superword register value numbering includes hashing an operation code and the value numbers of a plurality of sources to generate a first hash value. The method and apparatus further includes retrieving an operation value number from the first hash table based on the first hash value. The method and apparatus further includes generating a result value number based on a previous bit hash value and the operation value number. The result value number is a combination of the operation value numbers for each component having a live indicator and a previous value numbers for the components without the live indicator. Thereupon, the method and apparatus includes searching a second hash table using the result value number. As such, the method and apparatus provides using two separate hash tables for value numbering with superword instructions.Type: ApplicationFiled: January 30, 2004Publication date: September 8, 2005Applicant: ATI Technologies, Inc.Inventors: Norman Rubin, Richard Bagley
-
Publication number: 20050166194Abstract: A method and apparatus for SSA dead code elimination includes examining a first instruction off a worklist, wherein the first instruction includes previous link and a write mask and the first instruction is an SSA instruction. The method and apparatus further includes examining at least one second instruction of the machine code, wherein the at least one second instructions are sources of the first instruction and the at least one second instructions are SSA instruction. In the method and apparatus, each of the at least one second instructions include a previous link and a write mask. The method and apparatus further includes determining if any elements within a particular field are live for the at least one second instruction. If no the elements are live, the method and apparatus provides for deleting the first instruction from the machine code as it is determined that this instruction is extraneous, dead code.Type: ApplicationFiled: January 28, 2004Publication date: July 28, 2005Applicant: ATI Technologies, Inc.Inventors: Norman Rubin, Myron King
-
Publication number: 20050154864Abstract: A method and apparatus for nested control flow includes a processor having at least one context bit. The processor includes a plurality of arithmetic logic units for performing single instruction multiple data (SIMD) operations. The method and apparatus further includes a first memory device storing a plurality of instructions wherein each of the plurality of instructions includes a plurality of extra bits. The processor is operative to execute the instructions based on the extra bits and in conjunction with a context bit. The method and apparatus further includes a second memory device, such as a general purpose register operably coupled to the processor, the second memory device receiving an incrementing counter instruction upon the execution of one of the plurality of instructions. As such, the method and apparatus allows for nested control flow through a single context bit in conjunction with instructions having a plurality of extra bits.Type: ApplicationFiled: January 14, 2004Publication date: July 14, 2005Applicant: ATI Technologies, Inc.Inventors: Norman Rubin, Andrew Gruber