Patents by Inventor Yun Du

Yun Du has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Graph spatial split

Patent number: 12380060

Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor comprising a grid of compute units and grid of memory units interconnected with a switching array. The compute graph includes a node specifying an operation on a tensor. The node may be split into multiple nodes that each specify the operation on a distinctive portion of the tensor to produce a first modified compute graph. The first modified compute graph may be executed. In addition, the multiple nodes may be within a single meta-pipeline stage and may be processed in parallel. Furthermore, the compute graph may further comprise a separate node for gathering the distinctive portions of the tensor into a complete tensor, to produce a second modified compute graph.

Type: Grant

Filed: May 25, 2023

Date of Patent: August 5, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Yun Du, Gao Deng, Jianding Luo, Zhengyu Chen
Bandwidth-aware computational graph mapping

Patent number: 12299424

Abstract: A computer-implemented method of transforming a high-level program for mapping onto a coarse-grained reconfigurable (CGR) processor with an array of CGR units, including sectioning a dataflow graph into a plurality of sections; extracting performance information for each of the plurality of sections; on a CGR unit: assigning to a section at least two computations dependent on a first data element; scheduling an additional load of the first data element in response to available memory bandwidth for that section; eliminating a buffer between the additional load of the first data element and one of the two computations, for that section; generating configuration data for the placed positions and the routed data and communication channels, wherein the configuration data, when loaded onto an instance of the array of CGR units, causes the array of CGR units to implement the dataflow graph; and storing the configuration data in a non-transitory computer-readable storage medium.

Type: Grant

Filed: March 15, 2023

Date of Patent: May 13, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Gao Deng, Weihang Fan, Fei Wang, Yun Du
COMPUTATIONAL NODES FUSION IN A RECONFIGURABLE DATA PROCESSOR

Publication number: 20250103550

Abstract: A system includes an array of reconfigurable units further including a plurality of configurable elements such as pattern memory units (PMUs), pattern compute units (PCUs), and communication agents. The system further includes a configuration module to provide configuration data to configure the PMUs and PCUs. The systems further includes a compiler configured to generate a pipeline of a plurality of PCUs related to a dataflow graph, interleaved between a plurality of PMUs. Each PCU is coupled to perform calculations based on data received from a preceding PMU and store results of the calculations into a following PMU of the plurality of PMUs after a latency. The compiler is further configured to remove a PMU from the pipeline based on a comparison of the latencies of the PCUs. A corresponding method is also disclosed herein.

Type: Application

Filed: December 9, 2024

Publication date: March 27, 2025

Applicant: SambaNova Systems, Inc.

Inventors: Yun DU, Jianding LUO
Performing matrix multiplication in a streaming processor

Patent number: 12229215

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Grant

Filed: October 16, 2023

Date of Patent: February 18, 2025

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Gang Zhong, Fei Wei, Yibin Zhang, Jing Han, Hongjiang Shang, Elina Kamenetskaya, Minjie Huang, Alexei Vladimirovich Bourd, Chun Yu, Andrew Evan Gruber, Eric Demers
Runtime mechanism to optimize shader execution flow

Patent number: 12229864

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

Type: Grant

Filed: August 5, 2022

Date of Patent: February 18, 2025

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Eric Demers, Andrew Evan Gruber, Chun Yu, Baoguang Yang, Chihong Zhang, Yuehai Du, Avinash Seetharamaiah, Jonnala Gadda Nagendra Kumar, Gang Zhong, Zilin Ying, Fei Wei
Low latency nodes fusion in a reconfigurable data processor

Patent number: 12189570

Abstract: A data processing system includes an array of reconfigurable units and a compiler configured to generate a pipeline of n computational nodes related to a dataflow graph, interleaved between n+1 buffers on the array of reconfigurable units. Each computational node is coupled to perform calculations based on data received from an immediately preceding buffer of the n+1 buffers and store results of the calculations into an immediately following buffer of the n+1 buffers after a latency. The compiler is further configured to remove a buffer of the n+1 buffers from the pipeline based on a comparison of the latencies of the computational nodes. A corresponding method is also disclosed herein.

Type: Grant

Filed: May 19, 2023

Date of Patent: January 7, 2025

Assignee: SambaNova Systems, Inc.

Inventors: Yun Du, Jianding Luo
Grating adjusting apparatus and 3D display apparatus

Patent number: 12130452

Abstract: A grating adjustment apparatus includes a first electrode layer, a second electrode layer and a first substrate and a second substrate that are opposite to each other; the grating adjustment apparatus further includes a plurality of first driving lines, a plurality of second driving lines and a plurality of grating units arranged in the first direction, and is configured as: when the grating adjustment apparatus is powered on, the grating unit is capable of forming a light transmission unit and a shading unit, and opening positions and/or opening ratios of the grating unit are adjustable; and the plurality of grating units are divided into at least one group; for the grating units in the same group, at least two of the first sub-electrodes are electrically connected to different first driving lines, and at least two of the second sub-electrodes are electrically connected to different second driving lines.

Type: Grant

Filed: January 3, 2023

Date of Patent: October 29, 2024

Assignees: HEFEI BOE OPTOELECTRONICS TECHNOLOGY CO., LTD., BOE TECHNOLOGY GROUP CO., LTD.

Inventors: Zhao Dong, Ru Zhou, Xiaoqing Peng, Yun Du, Hu Li, Donghui Wang, Ran An, Douqing Zhang
SYSTEMS AND METHODS FOR A LEARNING AND DEVELOPMENT OPERATIONS MANAGEMENT PLATFORM

Publication number: 20240346396

Abstract: The present invention comprises learning and development content and programs operations management systems and methods. The present systems offers a comprehensive data and requests management system to optimize the handling of learning requests across single or multiple decentralized learning and development teams or groups, integrate and improve the planning, management and assessment of learning and development projects, and synergistically improve and integrate the creation, implementation, practice, and refinement of content design and content design processes of learning and development team professionals and users.

Type: Application

Filed: December 9, 2023

Publication date: October 17, 2024

Inventors: Ryan Austin, Matthew Ryan Ball, Jason Primeau, Darren Card, Yun Du, Pratik Bidkar, Erick Alejandro Montanez Soda, Alejandro Ariztegui Abimerhi, Shruti Bhagwat
Run-time mechanism for optimal shader

Patent number: 12067666

Abstract: Aspects presented herein relate to methods and devices for graphics processing including an apparatus, e.g., a GPU. The apparatus may receive a set of draw call instructions corresponding to a graphics workload, where the set of draw call instructions is associated with at least one run-time parameter. The apparatus may also obtain a first shader program associated with storing data in a system memory and at least one second shader program associated with storing data in a constant memory. Further, the apparatus may execute the first shader program or the at least one second shader program based on whether the at least one run-time parameter is less than or equal to a size of the constant memory. The apparatus may also update or maintain a configuration of a shader processor or a streaming processor based on executing the first shader program or the at least one second shader program.

Type: Grant

Filed: May 18, 2022

Date of Patent: August 20, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Eric Demers, Andrew Evan Gruber, Chun Yu, Chihong Zhang, Baoguang Yang, Yuehai Du, Gang Zhong, Avinash Seetharamaiah, Jonnala Gadda Nagendra Kumar
Methods and apparatus to facilitate a dedicated bindless state processor

Patent number: 12056790

Abstract: The present disclosure relates to methods and apparatus for graphics processing. For example, disclosed techniques facilitate improving bindless state processing at a graphics processor. Aspects of the present disclosure can receive, at a graphics processor, a shader program including a preamble section and a main instructions section. Aspects of the present disclosure can also execute, with a scalar processor dedicated to processing preamble sections, instructions of the preamble section to implement a bindless mechanism for loading constant data associated with the shader program. Additionally, aspects of the present disclosure can distribute the main instructions section and the constant data to a streaming processor for executing the shader program.

Type: Grant

Filed: January 31, 2020

Date of Patent: August 6, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Andrew Evan Gruber, Chun Yu, Chihong Zhang, Thomas Edwin Frisinger, Richard Hammerstone, Zilin Ying, Heng Qi, Quanquan Xu, Sheng Gu
Fast incremental shared constants

Patent number: 12056804

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for fast incremental shared constants. In aspects, a CPU may determine/update shared constant data for a first draw call of a plurality of draw calls. The shared constant data, which may correspond to at least one shader, may be updated based on a draw call update for the first draw call. The CPU may communicate the updated shared constant data for the first draw call to a GPU. The GPU may receive, in at least one register, the updated shared constant data from the CPU and configure the at least one register based on the updated shared constant data corresponding to the draw call update of the first draw call of the plurality of draw calls.

Type: Grant

Filed: May 15, 2023

Date of Patent: August 6, 2024

Assignee: QUALCOMM Incorporated

Inventors: Thomas Edwin Frisinger, Richard Hammerstone, Andrew Evan Gruber, Gang Zhong, Yun Du, Jonnala Gadda Nagendra Kumar
Touch control display substrate, touch control display device, and touch control signal line distribution method

Patent number: 12014006

Abstract: A touch display substrate is provided, including a central touch area and a routing area located around the central touch area, where the routing area is provided with isolation lines and a plurality of touch signal lines led out from the central touch area, the extension direction of the isolation lines is parallel to the extension direction of the touch signal lines, the touch signal lines include first touch signal lines arranged close to the isolation lines and second touch signal lines arranged far from the isolation lines, and the width of the first touch signal lines is greater than the width of the second touch signal lines. A touch display device and a touch control signal line distribution method are provided.

Type: Grant

Filed: April 28, 2021

Date of Patent: June 18, 2024

Assignees: Hefei Xinsheng Optoelectronics Technology Co., Ltd., BOE Technology Group Co., Ltd.

Inventors: Jiawei Xu, Yun Du, Zhao Dong, Wenjin Fan
Graph Spatial Split

Publication number: 20240168915

Abstract: A method for reducing latency and increasing throughput in a reconfigurable computing system includes receiving a compute graph for execution on a reconfigurable dataflow processor comprising a grid of compute units and grid of memory units interconnected with a switching array. The compute graph includes a node specifying an operation on a tensor. The node may be split into multiple nodes that each specify the operation on a distinctive portion of the tensor to produce a first modified compute graph. The first modified compute graph may be executed. In addition, the multiple nodes may be within a single meta-pipeline stage and may be processed in parallel. Furthermore, the compute graph may further comprise a separate node for gathering the distinctive portions of the tensor into a complete tensor, to produce a second modified compute graph.

Type: Application

Filed: May 25, 2023

Publication date: May 23, 2024

Applicant: SambaNova Systems, Inc.

Inventors: Yun DU, Gao DENG, Jianding LUO, Zhengyu CHEN
GRATING ADJUSTING APPARATUS AND 3D DISPLAY APPARATUS

Publication number: 20240160034

Abstract: A grating adjustment apparatus includes a first electrode layer, a second electrode layer and a first substrate and a second substrate that are opposite to each other; the grating adjustment apparatus further includes a plurality of first driving lines, a plurality of second driving lines and a plurality of grating units arranged in the first direction, and is configured as: when the grating adjustment apparatus is powered on, the grating unit is capable of forming a light transmission unit and a shading unit, and opening positions and/or opening ratios of the grating unit are adjustable; and the plurality of grating units are divided into at least one group; for the grating units in the same group, at least two of the first sub-electrodes are electrically connected to different first driving lines, and at least two of the second sub-electrodes are electrically connected to different second driving lines.

Type: Application

Filed: January 3, 2023

Publication date: May 16, 2024

Applicants: HEFEI BOE OPTOELECTRONICS TECHNOLOGY CO., LTD., BOE Technology Group Co., Ltd.

Inventors: Zhao Dong, Ru Zhou, Xiaoqing Peng, Yun Du, Hu Li, Donghui Wang, Ran An, Douqing Zhang
Dynamic wave pairing

Patent number: 11954758

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for dynamic wave pairing. A graphics processor may allocate one or more GPU workloads to one or more wave slots of a plurality of wave slots. The graphics processor may select a first execution slot of a plurality of execution slots for executing the one or more GPU workloads. The selection may be based on one of a plurality of granularities. The graphics processor may execute, at the selected first execution slot, the one or more GPU workloads at the one of the plurality of granularities.

Type: Grant

Filed: February 24, 2022

Date of Patent: April 9, 2024

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Andrew Evan Gruber, Zilin Ying, Chunling Hu, Baoguang Yang, Yang Xia, Gang Zhong, Chun Yu, Eric Demers
RUNTIME MECHANISM TO OPTIMIZE SHADER EXECUTION FLOW

Publication number: 20240046543

Abstract: This disclosure provides systems, devices, apparatus, and methods, including computer programs encoded on storage media, for runtime optimization of the shader execution flow. A graphics processor may obtain instruction execution data associated with a graphics workload, the instruction execution data including graphics data for a set of shader operations. The graphics processor may configure, at a first iteration, at least one predication value based on the instruction execution data including the graphics data for the set of shader operations. The graphics processor may adjust, at a second iteration, an execution flow of the graphics workload based on the configured at least one predication value, the execution flow of the graphics workload including the set of shader operations. The graphics processor may execute or refrain from executing, at the second iteration, each of the set of shader operations based on the adjusted execution flow of the graphics workload.

Type: Application

Filed: August 5, 2022

Publication date: February 8, 2024

Inventors: Yun DU, Eric DEMERS, Andrew Evan GRUBER, Chun YU, Baoguang YANG, Chihong ZHANG, Yuehai DU, Avinash SEETHARAMAIAH, Jonnala Gadda NAGENDRA KUMAR, Gang ZHONG, Zilin YING, Fei WEI
TOUCH CONTROL DISPLAY SUBSTRATE, TOUCH CONTROL DISPLAY DEVICE, AND TOUCH CONTROL SIGNAL LINE DISTRIBUTION METHOD

Publication number: 20240045546

Abstract: A touch display substrate is provided, including a central touch area and a routing area located around the central touch area, where the routing area is provided with isolation lines and a plurality of touch signal lines led out from the central touch area, the extension direction of the isolation lines is parallel to the extension direction of the touch signal lines, the touch signal lines include first touch signal lines arranged close to the isolation lines and second touch signal lines arranged far from the isolation lines, and the width of the first touch signal lines is greater than the width of the second touch signal lines. A touch display device and a touch control signal line distribution method are provided.

Type: Application

Filed: April 28, 2021

Publication date: February 8, 2024

Inventors: Jiawei XU, Yun DU, Zhao DONG, Wenjin FAN
PERFORMING MATRIX MULTIPLICATION IN A STREAMING PROCESSOR

Publication number: 20240037183

Abstract: The present disclosure relates to methods and apparatus for compute processing. For example, disclosed techniques facilitate improving performance of matrix multiplication in streaming processor. Aspects of the present disclosure can execute, with a load control unit, a first load instruction to load a set of input data of an input matrix from a first memory to a second memory. Aspects of the present disclosure can also execute, with the load control unit, a second load instruction to load a set of weight data of a weight matrix from the first memory to the second memory. Additionally, aspects of the present disclosure can perform, with an ALU component, a matrix multiplication operation using the set of input data and the set of weight data to generate an output matrix. Further, aspects of the present disclosure can store the output matrix at a general purpose register accessible to the ALU component.

Type: Application

Filed: October 16, 2023

Publication date: February 1, 2024

Inventors: Yun DU, Gang ZHONG, Fei WEI, Yibin ZHANG, Jing HAN, Hongjiang SHANG, Elina KAMENETSKAYA, Minjie HUANG, Alexei Vladimirovich BOURD, Chun YU, Andrew Evan GRUBER, Eric DEMERS
RASTERIZATION OF COMPUTE WORKLOADS

Publication number: 20230394738

Abstract: The present disclosure relates to methods and apparatus for graphics processing, e.g., a GPU. The apparatus may receive an image including a plurality of pixels associated with one or more workgroups and one or more pixel tiles, each of the workgroups and the pixel tiles including one or more pixels of the plurality of pixels. The apparatus may determine whether the one or more workgroups are misaligned with the one or more pixel tiles. The apparatus may determine a conversion order of the one or more workgroups when the one or more workgroups are misaligned with the one or more pixel tiles, the conversion order corresponding to a common multiple of one of the one or more workgroups and one of the one or more pixel tiles. The apparatus may convert each of the one or more workgroups based on the conversion order of the one or more workgroups.

Type: Application

Filed: November 9, 2020

Publication date: December 7, 2023

Inventors: Yibin ZHANG, Zilin YING, Yun DU, Heng QI, Jiexia YU, Yang YU, Andrew Evan GRUBER, Jian LIANG, Tao WANG, Alexei Vladimirovich BOURD, Gang ZHONG, Minjie HUANG
Low Latency Nodes Fusion in a Reconfigurable Data Processor

Publication number: 20230385231

Abstract: A data processing system includes an array of reconfigurable units and a compiler configured to generate a pipeline of n computational nodes related to a dataflow graph, interleaved between n+1 buffers on the array of reconfigurable units. Each computational node is coupled to perform calculations based on data received from an immediately preceding buffer of the n+1 buffers and store results of the calculations into an immediately following buffer of the n+1 buffers after a latency. The compiler is further configured to remove a buffer of the n+1 buffers from the pipeline based on a comparison of the latencies of the computational nodes. A corresponding method is also disclosed herein.

Type: Application

Filed: May 19, 2023

Publication date: November 30, 2023

Applicant: SambaNova Systems, Inc.

Inventors: Yun DU, Jianding LUO

1 2 3 4 5 … next