Patents by Inventor Alexei Vladimirovich Bourd
Alexei Vladimirovich Bourd has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240104824Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.Type: ApplicationFiled: September 23, 2022Publication date: March 28, 2024Inventors: Piyush GUPTA, Pavan Kumar AKKARAJU, Alexei Vladimirovich BOURD, Andrew Evan GRUBER
-
Publication number: 20240062453Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.Type: ApplicationFiled: November 1, 2023Publication date: February 22, 2024Inventors: David Kirk MCALLISTER, Francois Mathias Robert DEMOULLIN, Alexei Vladimirovich BOURD
-
Publication number: 20240037183Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.Type: ApplicationFiled: October 16, 2023Publication date: February 1, 2024Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
-
Patent number: 11861785Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.Type: GrantFiled: February 4, 2022Date of Patent: January 2, 2024Assignee: QUALCOMM IncorporatedInventors: David Kirk McAllister, Francois Mathias Robert Demoullin, Alexei Vladimirovich Bourd
-
Publication number: 20230394738Abstract: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.Type: ApplicationFiled: November 9, 2020Publication date: December 7, 2023Inventors: Yibin ZHANG, Zilin YING, Yun DU, Heng QI, Jiexia YU, Yang YU, Andrew Evan GRUBER, Jian LIANG, Tao WANG, Alexei Vladimirovich BOURD, Gang ZHONG, Minjie HUANG
-
Patent number: 11829439Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.Type: GrantFiled: December 29, 2020Date of Patent: November 28, 2023Assignee: QUALCOMM IncorporatedInventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
-
Publication number: 20230252717Abstract: Systems and techniques are provided for enhancing operations of a ray tracing processor. For instance, a process can include obtaining one or more nodes of an acceleration data structure. Each node of the one or more nodes includes the same number of bytes. The node(s) can be stored in a cache associated with a ray tracing processor. Each of the stored node(s) are cache line-aligned with the cache associated with the ray tracing processor. A first stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a first clock cycle of the ray tracing processor. A second stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a second clock cycle of the ray tracing processor.Type: ApplicationFiled: February 4, 2022Publication date: August 10, 2023Inventors: David Kirk MCALLISTER, Fei WEI, Alexei Vladimirovich BOURD
-
Publication number: 20230252716Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.Type: ApplicationFiled: February 4, 2022Publication date: August 10, 2023Inventors: David Kirk MCALLISTER, Francois Mathias Robert DEMOULLIN, Alexei Vladimirovich BOURD
-
Patent number: 11508109Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can obtain at least one input image including a plurality of pixels. Additionally, the apparatus can determine shading information for each of the plurality of pixels in the at least one input image. The apparatus can also determine a shading map based on the determined shading information for each of the plurality of pixels in the at least one input image. In some aspects, the apparatus can generate at least one output image based on the at least one input image and the determined shading map. The apparatus can also enhance a quality of the at least one output image. In some aspects, the quality of the at least one output image can be enhanced based on machine learning. Further, the apparatus can generate the at least one input image including the plurality of pixels.Type: GrantFiled: March 31, 2020Date of Patent: November 22, 2022Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Reza Pourreza Shahri, Dam Backer, Brian Ellis, Roman Larionov, Li He, Vaibhav Rajesh Gandhi, Shuaib Arshad
-
Publication number: 20210200836Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.Type: ApplicationFiled: December 29, 2020Publication date: July 1, 2021Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
-
Publication number: 20210103467Abstract: A graphics processing unit (GPU) may execute a shader program that may include instructions for prioritization and scheduling of waves processed in parallel. According to some aspects of the described techniques, instruction variants (e.g., set-lowest-priority, set-highest-priority, set-priority-to-N, etc.) may be executed by hardware during processing of a wave to control (e.g., modify) processing priority for that wave. As such, the described techniques for shader controlled wave scheduling priority may allow waves to be processed while avoiding interference with lagging waves, while avoiding taking resources from lagging waves, etc. In one example, when a set-lowest-priority instruction is executed by hardware during execution of a first loop of a first wave, the instruction may push the current wave's priority to be lowest on the list. Such may result in pending loops from other waves being processed prior to the processing returning to a second loop of the first wave.Type: ApplicationFiled: October 2, 2019Publication date: April 8, 2021Inventors: Elina Kamenetskaya, Andrew Evan Gruber, Alexei Vladimirovich Bourd
-
Publication number: 20200388022Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can obtain at least one input image including a plurality of pixels. Additionally, the apparatus can determine shading information for each of the plurality of pixels in the at least one input image. The apparatus can also determine a shading map based on the determined shading information for each of the plurality of pixels in the at least one input image. In some aspects, the apparatus can generate at least one output image based on the at least one input image and the determined shading map. The apparatus can also enhance a quality of the at least one output image. In some aspects, the quality of the at least one output image can be enhanced based on machine learning. Further, the apparatus can generate the at least one input image including the plurality of pixels.Type: ApplicationFiled: March 31, 2020Publication date: December 10, 2020Inventors: Alexei Vladimirovich BOURD, Reza POURREZA SHAHRI, Dam BACKER, Brian ELLIS, Roman LARIONOV, Li HE, Vaibhav Rajesh GANDHI, Shuaib ARSHAD
-
Patent number: 10592468Abstract: Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.Type: GrantFiled: July 13, 2016Date of Patent: March 17, 2020Assignee: QUALCOMM IncorporatedInventors: Liang Han, Xiangdong Jin, Lin Chen, Yun Du, Alexei Vladimirovich Bourd
-
Patent number: 10223436Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.Type: GrantFiled: September 7, 2016Date of Patent: March 5, 2019Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
-
Patent number: 10210593Abstract: A graphics processing unit (GPU) may dispatch a first set of commands for execution on one or more processing units of the GPU. The GPU may receive notification from a host device indicating that a second set of commands are ready to execute on the GPU. In response, the GPU may issue a first preemption command at a first preemption granularity to the one or more processing units. In response to the GPU failing to preempt execution of the first set of commands within an elapsed time period after issuing the first preemption command, the GPU may issue a second preemption command at a second preemption granularity to the one or more processing units, where the second preemption granularity is finer-grained than the first preemption granularity.Type: GrantFiled: January 28, 2016Date of Patent: February 19, 2019Assignee: QUALCOMM IncorporatedInventors: Anirudh Rajendra Acharya, Alexei Vladimirovich Bourd, David Rigel Garcia Garcia, Milind Nilkanth Nemlekar, Vineet Goel
-
Patent number: 10133572Abstract: A SIMD processor may be configured to determine one or more active threads from a plurality of threads, select one active thread from the one or more active threads, and perform a divergent operation on the selected active thread. The divergent operation may be a serial operation.Type: GrantFiled: May 2, 2014Date of Patent: November 20, 2018Assignee: QUALCOMM IncorporatedInventors: Andrew Evan Gruber, Lin Chen, Yun Du, Alexei Vladimirovich Bourd
-
Patent number: 10055342Abstract: This disclosure describes techniques for supporting inter-task communication in a parallel computing system. The techniques for supporting inter-task communication may use hardware-based atomic operations to maintain the state of a pipe. A pipe may refer to a First-In, First-Out (FIFO)-organized buffer that allows various tasks to interact with the buffer as data producers or data consumers. Various pipe implementations may use multiple state parameters to define the state of a pipe. The hardware-based atomic operations described in this disclosure may modify multiple pipe state parameters in an atomic fashion. Modifying multiple pipe state parameters in an atomic fashion may avoid race conditions that would otherwise occur when multiple producers and/or multiple consumers attempt to modify the state of a pipe at the same time. In this way, pipe-based inter-task communication may be supported in a parallel computing system.Type: GrantFiled: March 19, 2014Date of Patent: August 21, 2018Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Swapnil Pradipkumar Sakharshete, Fei Xu
-
Patent number: 10026145Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.Type: GrantFiled: December 13, 2016Date of Patent: July 17, 2018Assignee: QUALCOMM IncorporatedInventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
-
Publication number: 20180165786Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.Type: ApplicationFiled: December 13, 2016Publication date: June 14, 2018Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
-
Publication number: 20180018299Abstract: Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.Type: ApplicationFiled: July 13, 2016Publication date: January 18, 2018Inventors: Liang Han, Xiangdong Jin, Lin Chen, Yun Du, Alexei Vladimirovich Bourd