Patents by Inventor Sumesh Udayakumaran
Sumesh Udayakumaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11243752Abstract: Described herein are techniques for generating a stitched shader program. The techniques include identifying a set of shader programs to include in the stitched shader program, wherein the set includes at least one multiversion shader program that includes a first version of instructions and a second version of instructions, wherein the first version of instructions uses a first number of resources that is different than a second number of resources used by the second version of instructions. The techniques also include combining the set of shader programs to form the stitched shader program. The techniques further include determining a number of resources for the stitched shader program. The techniques also include based on the determined number of resources, modifying the instructions corresponding to the multiversion shader program to, when executed, execute either the first version of instructions, or the second version of instructions.Type: GrantFiled: July 11, 2019Date of Patent: February 8, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Sumesh Udayakumaran
-
Publication number: 20210011697Abstract: Described herein are techniques for generating a stitched shader program. The techniques include identifying a set of shader programs to include in the stitched shader program, wherein the set includes at least one multiversion shader program that includes a first version of instructions and a second version of instructions, wherein the first version of instructions uses a first number of resources that is different than a second number of resources used by the second version of instructions. The techniques also include combining the set of shader programs to form the stitched shader program. The techniques further include determining a number of resources for the stitched shader program. The techniques also include based on the determined number of resources, modifying the instructions corresponding to the multiversion shader program to, when executed, execute either the first version of instructions, or the second version of instructions.Type: ApplicationFiled: July 11, 2019Publication date: January 14, 2021Applicant: Advanced Micro Devices, Inc.Inventor: Sumesh Udayakumaran
-
Patent number: 10026145Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.Type: GrantFiled: December 13, 2016Date of Patent: July 17, 2018Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
-
Publication number: 20180165786Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.Type: ApplicationFiled: December 13, 2016Publication date: June 14, 2018Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
-
Patent number: 9747104Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.Type: GrantFiled: May 12, 2014Date of Patent: August 29, 2017Assignee: QUALCOMM IncorporatedInventors: Lin Chen, Yun Du, Sumesh Udayakumaran, Chihong Zhang, Andrew Evan Gruber
-
Patent number: 9329867Abstract: This disclosure describes techniques for allocating registers in a computing system that supports vector physical registers. The techniques for allocating registers may allocate physical registers to vector virtual registers based on priority information that is indicative of a relative importance of allocating respective vector virtual registers as vectors rather than scalars. The techniques for allocating registers may involve allocating physical registers to the vector virtual registers in an order that is determined based on the priority information.Type: GrantFiled: September 23, 2014Date of Patent: May 3, 2016Assignee: QUALCOMM IncorporatedInventors: Sumesh Udayakumaran, Se Jong Oh
-
Publication number: 20150324196Abstract: In one example, a method includes responsive to receiving, by a processing unit, one or more instructions requesting that a first value be moved from a first general purpose register (GPR) to a third GPR and that a second value be moved from a second GPR to a fourth GPR, copying, by an initial logic unit and during a first clock cycle, the first value to an initial pipeline register, copying, by the initial logic and during a second clock cycle, the second value to the initial pipeline register, copying, by a final logic unit and during a third clock cycle, the first value from a final pipeline register to the third GPR, and copying, by the final logic unit and during a fourth clock cycle, the second value from the final pipeline register to the fourth GPR.Type: ApplicationFiled: May 12, 2014Publication date: November 12, 2015Applicant: QUALCOMM IncorporatedInventors: Lin Chen, Yun Du, Sumesh Udayakumaran, Chihong Zhang, Andrew Evan Gruber
-
Publication number: 20150193234Abstract: This disclosure describes techniques for allocating registers in a computing system that supports vector physical registers. The techniques for allocating registers may allocate physical registers to vector virtual registers based on priority information that is indicative of a relative importance of allocating respective vector virtual registers as vectors rather than scalars. The techniques for allocating registers may involve allocating physical registers to the vector virtual registers in an order that is determined based on the priority information.Type: ApplicationFiled: September 23, 2014Publication date: July 9, 2015Inventors: Sumesh Udayakumaran, Se Jong Oh
-
Patent number: 8933954Abstract: In general, aspects of this disclosure describe a compiler for allocation of physical registers for storing constituent scalar values of a non-scalar value. In some example, the compiler, executing on a processor, may receive an instruction for operation on a non-scalar value. The compiler may divide the instruction into a plurality of instructions for operation on constituent scalar values of the non-scalar value. The compiler may allocate a plurality of physical registers to store the constituent scalar values.Type: GrantFiled: March 23, 2011Date of Patent: January 13, 2015Assignee: QUALCOMM IncorporatedInventor: Sumesh Udayakumaran
-
Patent number: 8732679Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.Type: GrantFiled: March 16, 2010Date of Patent: May 20, 2014Assignee: QUALCOMM IncorporatedInventors: Sumesh Udayakumaran, Chihong Zhang
-
Patent number: 8494770Abstract: A method and system for calculating savings routes for display on a portable computing device (PCD) are described. The method includes receiving at least one of a product category and a service category from an operator of a PCD. The PCD may also receive a destination address. With this information, circle of influence data based on an offer for at least one product or service corresponding to the product category or service category may be generated and provided to the PCD. The circle of influence data may impact edge weights of a graph search algorithm. The graph search algorithm solves a single-source shortest path problem for a graph with non-negative edge path costs. The circles of influence in combination with the graph search algorithm allow a PCD to calculate one or more savings routes based on a start point and the desired destination address provided by the operator of the PCD.Type: GrantFiled: March 15, 2011Date of Patent: July 23, 2013Assignee: QUALCOMM IncorporatedInventors: Babak Forutanpour, Wolfgang G. Frank, Sumesh Udayakumaran
-
Publication number: 20120242673Abstract: In general, aspects of this disclosure describe a compiler for allocation of physical registers for storing constituent scalar values of a non-scalar value. In some example, the compiler, executing on a processor, may receive an instruction for operation on a non-scalar value. The compiler may divide the instruction into a plurality of instructions for operation on constituent scalar values of the non-scalar value. The compiler may allocate a plurality of physical registers to store the constituent scalar values.Type: ApplicationFiled: March 23, 2011Publication date: September 27, 2012Applicant: QUALCOMM IncorporatedInventor: Sumesh Udayakumaran
-
Publication number: 20120239288Abstract: A method and system for calculating savings routes for display on a portable computing device (PCD) are described. The method includes receiving at least one of a product category and a service category from an operator of a PCD. The PCD may also receive a destination address. With this information, circle of influence data based on an offer for at least one product or service corresponding to the product category or service category may be generated and provided to the PCD. The circle of influence data may impact edge weights of a graph search algorithm. The graph search algorithm solves a single-source shortest path problem for a graph with non-negative edge path costs. The circles of influence in combination with the graph search algorithm allow a PCD to calculate one or more savings routes based on a start point and the desired destination address provided by the operator of the PCD.Type: ApplicationFiled: March 15, 2011Publication date: September 20, 2012Inventors: Babak FORUTANPOUR, Wolfgang G. Frank, Sumesh Udayakumaran
-
Publication number: 20110231830Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.Type: ApplicationFiled: March 16, 2010Publication date: September 22, 2011Applicant: QUALCOMM INCORPORATEDInventors: Sumesh Udayakumaran, Chihong Zhang
-
Patent number: 7367024Abstract: A highly predictable, low overhead and yet dynamic, memory allocation methodology for embedded systems with scratch-pad memory is presented. The dynamic memory allocation methodology for global and stack data (i) accounts for changing program requirements at runtime; (ii) has no software-caching tags; (iii) requires no run-time checks; (iv) has extremely low overheads; and (v) yields 100% predictable memory access times. The methodology provides that for data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary.Type: GrantFiled: September 21, 2004Date of Patent: April 29, 2008Assignee: University of MarylandInventors: Rajeev Kumar Barua, Sumesh Udayakumaran
-
Publication number: 20060080372Abstract: A highly predictable, low overhead and yet dynamic, memory allocation methodology for embedded systems with scratch-pad memory is presented. The dynamic memory allocation methodology for global and stack data (i) accounts for changing program requirements at runtime; (ii) has no software-caching tags; (iii) requires no run-time checks; (iv) has extremely low overheads; and (v) yields 100% predictable memory access times. The methodology provides that for data that is about to be accessed frequently is copied into the SRAM using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary.Type: ApplicationFiled: September 21, 2004Publication date: April 13, 2006Inventors: Rajeey Barua, Sumesh Udayakumaran