Patents by Inventor Alexei Vladimirovich Bourd

Alexei Vladimirovich Bourd has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ACCELERATED BOUNDING VOLUME HIERARCHY (BVH) TRAVERSAL FOR RAY TRACING

Publication number: 20240104824

Abstract: Systems and techniques are provided for accelerated ray tracing. For instance, a process can include obtaining a hierarchical acceleration data structure that includes a plurality of primitives of a scene object and obtaining a respective information value associated with each primitive included in the plurality of primitives. A sort order can be determined for two or more nodes included in a same level of the hierarchical acceleration data structure at least in part by sorting the two or more nodes based on a respective sorting parameter value determined for each respective node of the two or more nodes. Each respective sorting parameter value can be determined based on at least one information value associated with one or more primitives included in a sub-tree of each respective node of the two or more nodes. The hierarchical acceleration data structure can be traversed using the sort order.

Type: Application

Filed: September 23, 2022

Publication date: March 28, 2024

Inventors: Piyush GUPTA, Pavan Kumar AKKARAJU, Alexei Vladimirovich BOURD, Andrew Evan GRUBER
GENERATION OF TIGHT WORLD SPACE BOUNDING REGIONS

Publication number: 20240062453

Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.

Type: Application

Filed: November 1, 2023

Publication date: February 22, 2024

Inventors: David Kirk MCALLISTER, Francois Mathias Robert DEMOULLIN, Alexei Vladimirovich BOURD
PERFORMING MATRIX MULTIPLICATION IN A STREAMING PROCESSOR

Publication number: 20240037183

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Application

Filed: October 16, 2023

Publication date: February 1, 2024

Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
Generation of tight world space bounding regions

Patent number: 11861785

Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.

Type: Grant

Filed: February 4, 2022

Date of Patent: January 2, 2024

Assignee: QUALCOMM Incorporated

Inventors: David Kirk McAllister, Francois Mathias Robert Demoullin, Alexei Vladimirovich Bourd
RASTERIZATION OF COMPUTE WORKLOADS

Publication number: 20230394738

Abstract: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.

Type: Application

Filed: November 9, 2020

Publication date: December 7, 2023

Inventors: Yibin ZHANG, Zilin YING, Yun DU, Heng QI, Jiexia YU, Yang YU, Andrew Evan GRUBER, Jian LIANG, Tao WANG, Alexei Vladimirovich BOURD, Gang ZHONG, Minjie HUANG
Methods and apparatus to perform matrix multiplication in a streaming processor

Patent number: 11829439

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Grant

Filed: December 29, 2020

Date of Patent: November 28, 2023

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
RAY TRACING PROCESSOR

Publication number: 20230252717

Abstract: Systems and techniques are provided for enhancing operations of a ray tracing processor. For instance, a process can include obtaining one or more nodes of an acceleration data structure. Each node of the one or more nodes includes the same number of bytes. The node(s) can be stored in a cache associated with a ray tracing processor. Each of the stored node(s) are cache line-aligned with the cache associated with the ray tracing processor. A first stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a first clock cycle of the ray tracing processor. A second stored node of the stored node(s) can be provided to the ray tracing processor and processed by the ray tracing processor during a second clock cycle of the ray tracing processor.

Type: Application

Filed: February 4, 2022

Publication date: August 10, 2023

Inventors: David Kirk MCALLISTER, Fei WEI, Alexei Vladimirovich BOURD
GENERATION OF TIGHT WORLD SPACE BOUNDING REGIONS

Publication number: 20230252716

Abstract: Systems and techniques are provided for determining bounding regions for a hierarchical structure for ray tracing. For instance, a process can include obtaining an acceleration data structure, the acceleration data structure including one or more primitives of a scene object. A graph cut can be applied to the acceleration data structure. A set of nodes of the acceleration data structure can be determined based on the graph cut, wherein the determined set of nodes is located adjacent to the graph cut. A world-space bounding box can be generated for the scene object, using the set of nodes determined based on the graph cut.

Type: Application

Filed: February 4, 2022

Publication date: August 10, 2023

Inventors: David Kirk MCALLISTER, Francois Mathias Robert DEMOULLIN, Alexei Vladimirovich BOURD
Methods and apparatus for machine learning rendering

Patent number: 11508109

Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can obtain at least one input image including a plurality of pixels. Additionally, the apparatus can determine shading information for each of the plurality of pixels in the at least one input image. The apparatus can also determine a shading map based on the determined shading information for each of the plurality of pixels in the at least one input image. In some aspects, the apparatus can generate at least one output image based on the at least one input image and the determined shading map. The apparatus can also enhance a quality of the at least one output image. In some aspects, the quality of the at least one output image can be enhanced based on machine learning. Further, the apparatus can generate the at least one input image including the plurality of pixels.

Type: Grant

Filed: March 31, 2020

Date of Patent: November 22, 2022

Assignee: QUALCOMM Incorporated

Inventors: Alexei Vladimirovich Bourd, Reza Pourreza Shahri, Dam Backer, Brian Ellis, Roman Larionov, Li He, Vaibhav Rajesh Gandhi, Shuaib Arshad
METHODS AND APPARATUS TO PERFORM MATRIX MULTIPLICATION IN A STREAMING PROCESSOR

Publication number: 20210200836

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Application

Filed: December 29, 2020

Publication date: July 1, 2021

Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
SHADER CONTROLLED WAVE SCHEDULING PRIORITY

Publication number: 20210103467

Abstract: A graphics processing unit (GPU) may execute a shader program that may include instructions for prioritization and scheduling of waves processed in parallel. According to some aspects of the described techniques, instruction variants (e.g., set-lowest-priority, set-highest-priority, set-priority-to-N, etc.) may be executed by hardware during processing of a wave to control (e.g., modify) processing priority for that wave. As such, the described techniques for shader controlled wave scheduling priority may allow waves to be processed while avoiding interference with lagging waves, while avoiding taking resources from lagging waves, etc. In one example, when a set-lowest-priority instruction is executed by hardware during execution of a first loop of a first wave, the instruction may push the current wave's priority to be lowest on the list. Such may result in pending loops from other waves being processed prior to the processing returning to a second loop of the first wave.

Type: Application

Filed: October 2, 2019

Publication date: April 8, 2021

Inventors: Elina Kamenetskaya, Andrew Evan Gruber, Alexei Vladimirovich Bourd
METHODS AND APPARATUS FOR MACHINE LEARNING RENDERING

Publication number: 20200388022

Abstract: The present disclosure relates to methods and apparatus for graphics processing. The apparatus can obtain at least one input image including a plurality of pixels. Additionally, the apparatus can determine shading information for each of the plurality of pixels in the at least one input image. The apparatus can also determine a shading map based on the determined shading information for each of the plurality of pixels in the at least one input image. In some aspects, the apparatus can generate at least one output image based on the at least one input image and the determined shading map. The apparatus can also enhance a quality of the at least one output image. In some aspects, the quality of the at least one output image can be enhanced based on machine learning. Further, the apparatus can generate the at least one input image including the plurality of pixels.

Type: Application

Filed: March 31, 2020

Publication date: December 10, 2020

Inventors: Alexei Vladimirovich BOURD, Reza POURREZA SHAHRI, Dam BACKER, Brian ELLIS, Roman LARIONOV, Li HE, Vaibhav Rajesh GANDHI, Shuaib ARSHAD
Shuffler circuit for lane shuffle in SIMD architecture

Patent number: 10592468

Abstract: Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.

Type: Grant

Filed: July 13, 2016

Date of Patent: March 17, 2020

Assignee: QUALCOMM Incorporated

Inventors: Liang Han, Xiangdong Jin, Lin Chen, Yun Du, Alexei Vladimirovich Bourd
Inter-subgroup data sharing

Patent number: 10223436

Abstract: In an example, a method of transferring data may include synchronizing work-items corresponding to a first subgroup and work-items corresponding to a second subgroup with a barrier. The method may include performing an inter-subgroup data transfer between the first subgroup and the second subgroup.

Type: Grant

Filed: September 7, 2016

Date of Patent: March 5, 2019

Assignee: QUALCOMM Incorporated

Inventors: Alexei Vladimirovich Bourd, Vladislav Shimanskiy, Maxim Kazakov, Yun Du
Adaptive context switching

Patent number: 10210593

Abstract: A graphics processing unit (GPU) may dispatch a first set of commands for execution on one or more processing units of the GPU. The GPU may receive notification from a host device indicating that a second set of commands are ready to execute on the GPU. In response, the GPU may issue a first preemption command at a first preemption granularity to the one or more processing units. In response to the GPU failing to preempt execution of the first set of commands within an elapsed time period after issuing the first preemption command, the GPU may issue a second preemption command at a second preemption granularity to the one or more processing units, where the second preemption granularity is finer-grained than the first preemption granularity.

Type: Grant

Filed: January 28, 2016

Date of Patent: February 19, 2019

Assignee: QUALCOMM Incorporated

Inventors: Anirudh Rajendra Acharya, Alexei Vladimirovich Bourd, David Rigel Garcia Garcia, Milind Nilkanth Nemlekar, Vineet Goel
Techniques for serialized execution in a SIMD processing system

Patent number: 10133572

Abstract: A SIMD processor may be configured to determine one or more active threads from a plurality of threads, select one active thread from the one or more active threads, and perform a divergent operation on the selected active thread. The divergent operation may be a serial operation.

Type: Grant

Filed: May 2, 2014

Date of Patent: November 20, 2018

Assignee: QUALCOMM Incorporated

Inventors: Andrew Evan Gruber, Lin Chen, Yun Du, Alexei Vladimirovich Bourd
Hardware-based atomic operations for supporting inter-task communication

Patent number: 10055342

Abstract: This disclosure describes techniques for supporting inter-task communication in a parallel computing system. The techniques for supporting inter-task communication may use hardware-based atomic operations to maintain the state of a pipe. A pipe may refer to a First-In, First-Out (FIFO)-organized buffer that allows various tasks to interact with the buffer as data producers or data consumers. Various pipe implementations may use multiple state parameters to define the state of a pipe. The hardware-based atomic operations described in this disclosure may modify multiple pipe state parameters in an atomic fashion. Modifying multiple pipe state parameters in an atomic fashion may avoid race conditions that would otherwise occur when multiple producers and/or multiple consumers attempt to modify the state of a pipe at the same time. In this way, pipe-based inter-task communication may be supported in a parallel computing system.

Type: Grant

Filed: March 19, 2014

Date of Patent: August 21, 2018

Assignee: QUALCOMM Incorporated

Inventors: Alexei Vladimirovich Bourd, Swapnil Pradipkumar Sakharshete, Fei Xu
Resource sharing on shader processor of GPU

Patent number: 10026145

Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.

Type: Grant

Filed: December 13, 2016

Date of Patent: July 17, 2018

Assignee: QUALCOMM Incorporated

Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
RESOURCE SHARING ON SHADER PROCESSOR OF GPU

Publication number: 20180165786

Abstract: Techniques for allowing for concurrent execution of multiple different tasks and preempted prioritized execution of tasks on a shader processor. In an example operation, a driver executed by a central processing unit (CPU) configures GPU resources based on needs of a first “host” shader to allow the first shader to execute “normally” on the GPU. The GPU may observe two sets of tasks, “guest” tasks. Based on, for example, detecting an availability of resources, the GPU may determine a “guest” task may be run while the “host” task is running. A second “guest” shader executes on a GPU by using resources that were configured for the first “host” shader if there are available resources and, in some examples, additional resources are obtained through software-programmable means.

Type: Application

Filed: December 13, 2016

Publication date: June 14, 2018

Inventors: Alexei Vladimirovich Bourd, Maxim Kazakov, Chunhui Mei, Sumesh Udayakumaran
SHUFFLER CIRCUIT FOR LANE SHUFFLE IN SIMD ARCHITECTURE

Publication number: 20180018299

Abstract: Techniques are described to perform a shuffle operation. Rather than using an all-lane to all-lane cross bar, a shuffler circuit having a smaller cross bar is described. The shuffler circuit performs the shuffle operation piecewise by reordering data received from processing lanes and outputting the reordered data.

Type: Application

Filed: July 13, 2016

Publication date: January 18, 2018

Inventors: Liang Han, Xiangdong Jin, Lin Chen, Yun Du, Alexei Vladimirovich Bourd

1 2 next