Patents by Inventor Alexei Vladimirovich Bourd

Alexei Vladimirovich Bourd has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240104824
    Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 28, 2024
    Inventors: Piyush GUPTA, Pavan Kumar AKKARAJU, Alexei Vladimirovich BOURD, Andrew Evan GRUBER
  • Publication number: 20240062453
    Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.
    Type: Application
    Filed: November 1, 2023
    Publication date: February 22, 2024
    Inventors: David Kirk MCALLISTER, Francois Mathias Robert DEMOULLIN, Alexei Vladimirovich BOURD
  • Publication number: 20240037183
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: October 16, 2023
    Publication date: February 1, 2024
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Patent number: 11861785
    Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.
    Type: Grant
    Filed: February 4, 2022
    Date of Patent: January 2, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: David Kirk McAllister, Francois Mathias Robert Demoullin, Alexei Vladimirovich Bourd
  • Publication number: 20230394738
    Abstract: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.
    Type: Application
    Filed: November 9, 2020
    Publication date: December 7, 2023
    Inventors: Yibin ZHANG, Zilin YING, Yun DU, Heng QI, Jiexia YU, Yang YU, Andrew Evan GRUBER, Jian LIANG, Tao WANG, Alexei Vladimirovich BOURD, Gang ZHONG, Minjie HUANG
  • Patent number: 11829439
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: November 28, 2023
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
  • Publication number: 20230252717
    Abstract: Systems and techniques are provided for enhancing operations of a ray tracing processor. For instance, a process can include obtaining one or more nodes of an acceleration data structure. Each node of the one or more nodes includes the same number of bytes. The node(s) can be stored in a cache associated with a ray tracing processor. Each of the stored node(s) are cache line-aligned with the cache associated with the ray tracing processor. A first stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a first clock cycle of the ray tracing processor. A second stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a second clock cycle of the ray tracing processor.
    Type: Application
    Filed: February 4, 2022
    Publication date: August 10, 2023
    Inventors: David Kirk MCALLISTER, Fei WEI, Alexei Vladimirovich BOURD
  • Publication number: 20230252716
    Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.
    Type: Application
    Filed: February 4, 2022
    Publication date: August 10, 2023
    Inventors: David Kirk MCALLISTER, Francois Mathias Robert DEMOULLIN, Alexei Vladimirovich BOURD
  • Patent number: 11508109
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can obtain at least one input image including a plurality of pixels. Additionally, the apparatus can determine shading information for each of the plurality of pixels in the at least one input image. The apparatus can also determine a shading map based on the determined shading information for each of the plurality of pixels in the at least one input image. In some aspects, the apparatus can generate at least one output image based on the at least one input image and the determined shading map. The apparatus can also enhance a quality of the at least one output image. In some aspects, the quality of the at least one output image can be enhanced based on machine learning. Further, the apparatus can generate the at least one input image including the plurality of pixels.
    Type: Grant
    Filed: March 31, 2020
    Date of Patent: November 22, 2022
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei Vladimirovich Bourd, Reza Pourreza Shahri, Dam Backer, Brian Ellis, Roman Larionov, Li He, Vaibhav Rajesh Gandhi, Shuaib Arshad
  • Publication number: 20210200836
    Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.
    Type: Application
    Filed: December 29, 2020
    Publication date: July 1, 2021
    Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
  • Publication number: 20210103467
    Abstract: A graphics processing unit (GPU) may execute a shader program that may include instructions for prioritization and scheduling of waves processed in parallel. According to some aspects of the described techniques, instruction variants (e.g., set-lowest-priority, set-highest-priority, set-priority-to-N, etc.) may be executed by hardware during processing of a wave to control (e.g., modify) processing priority for that wave. As such, the described techniques for shader controlled wave scheduling priority may allow waves to be processed while avoiding interference with lagging waves, while avoiding taking resources from lagging waves, etc. In one example, when a set-lowest-priority instruction is executed by hardware during execution of a first loop of a first wave, the instruction may push the current wave's priority to be lowest on the list. Such may result in pending loops from other waves being processed prior to the processing returning to a second loop of the first wave.
    Type: Application
    Filed: October 2, 2019
    Publication date: April 8, 2021
    Inventors: Elina Kamenetskaya, Andrew Evan Gruber, Alexei Vladimirovich Bourd
  • Publication number: 20200388022
    Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can obtain at least one input image including a plurality of pixels. Additionally, the apparatus can determine shading information for each of the plurality of pixels in the at least one input image. The apparatus can also determine a shading map based on the determined shading information for each of the plurality of pixels in the at least one input image. In some aspects, the apparatus can generate at least one output image based on the at least one input image and the determined shading map. The apparatus can also enhance a quality of the at least one output image. In some aspects, the quality of the at least one output image can be enhanced based on machine learning. Further, the apparatus can generate the at least one input image including the plurality of pixels.
    Type: Application
    Filed: March 31, 2020
    Publication date: December 10, 2020
    Inventors: Alexei Vladimirovich BOURD, Reza POURREZA SHAHRI, Dam BACKER, Brian ELLIS, Roman LARIONOV, Li HE, Vaibhav Rajesh GANDHI, Shuaib ARSHAD
  • Patent number: 10592468
    Abstract: Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.
    Type: Grant
    Filed: July 13, 2016
    Date of Patent: March 17, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Liang Han, Xiangdong Jin, Lin Chen, Yun Du, Alexei Vladimirovich Bourd
  • Patent number: 10223436
    Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.
    Type: Grant
    Filed: September 7, 2016
    Date of Patent: March 5, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
  • Patent number: 10210593
    Abstract: A graphics processing unit (GPU) may dispatch a first set of commands for execution on one or more processing units of the GPU. The GPU may receive notification from a host device indicating that a second set of commands are ready to execute on the GPU. In response, the GPU may issue a first preemption command at a first preemption granularity to the one or more processing units. In response to the GPU failing to preempt execution of the first set of commands within an elapsed time period after issuing the first preemption command, the GPU may issue a second preemption command at a second preemption granularity to the one or more processing units, where the second preemption granularity is finer-grained than the first preemption granularity.
    Type: Grant
    Filed: January 28, 2016
    Date of Patent: February 19, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Anirudh Rajendra Acharya, Alexei Vladimirovich Bourd, David Rigel Garcia Garcia, Milind Nilkanth Nemlekar, Vineet Goel
  • Patent number: 10133572
    Abstract: A SIMD processor may be configured to determine one or more active threads from a plurality of threads, select one active thread from the one or more active threads, and perform a divergent operation on the selected active thread. The divergent operation may be a serial operation.
    Type: Grant
    Filed: May 2, 2014
    Date of Patent: November 20, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Andrew Evan Gruber, Lin Chen, Yun Du, Alexei Vladimirovich Bourd
  • Patent number: 10055342
    Abstract: This disclosure describes techniques for supporting inter-task communication in a parallel computing system. The techniques for supporting inter-task communication may use hardware-based atomic operations to maintain the state of a pipe. A pipe may refer to a First-In, First-Out (FIFO)-organized buffer that allows various tasks to interact with the buffer as data producers or data consumers. Various pipe implementations may use multiple state parameters to define the state of a pipe. The hardware-based atomic operations described in this disclosure may modify multiple pipe state parameters in an atomic fashion. Modifying multiple pipe state parameters in an atomic fashion may avoid race conditions that would otherwise occur when multiple producers and/or multiple consumers attempt to modify the state of a pipe at the same time. In this way, pipe-based inter-task communication may be supported in a parallel computing system.
    Type: Grant
    Filed: March 19, 2014
    Date of Patent: August 21, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei Vladimirovich Bourd, Swapnil Pradipkumar Sakharshete, Fei Xu
  • Patent number: 10026145
    Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.
    Type: Grant
    Filed: December 13, 2016
    Date of Patent: July 17, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
  • Publication number: 20180165786
    Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.
    Type: Application
    Filed: December 13, 2016
    Publication date: June 14, 2018
    Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
  • Publication number: 20180018299
    Abstract: Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.
    Type: Application
    Filed: July 13, 2016
    Publication date: January 18, 2018
    Inventors: Liang Han, Xiangdong Jin, Lin Chen, Yun Du, Alexei Vladimirovich Bourd