Patents by Inventor Yuanwei Fang

Yuanwei Fang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954093
    Abstract: Embodiments of the disclosure provide devices and methods for performing a top-k function. The device can include: a memory comprising a plurality of register files for storing the data elements, the plurality of register files comprising a parent register file and a first child register file associated with the parent register file, wherein the parent register file is associated with: first interface circuitry configured for reading a first parent data element from the parent register file and receiving a first child data element and a second child data element from the first child register file; and first comparison circuitry configured for updating the parent register file and the first child register file based on the first parent data element, the first child data element, and the second child data element according to a given principle.
    Type: Grant
    Filed: June 4, 2020
    Date of Patent: April 9, 2024
    Assignee: Alibaba Group Holding Limited
    Inventors: Fei Sun, Shuangchen Li, Dimin Niu, Fei Xue, Yuanwei Fang
  • Publication number: 20230394300
    Abstract: This application describes methods, systems, and apparatus, for neural network-based program sampling (NPS). An example device may obtain an assembly code of a program and an execution trace of the program, and divide the assembly code into a plurality of execution intervals. The device may construct a plurality of code graphs respectively corresponding to the plurality of execution intervals, and for each of the plurality of code graphs: generate a plurality of graph snapshots based on the code graph and the execution trace of the program; embed, by using a Graph Neural Network, the plurality of graph snapshots into a plurality of vectors; and aggregate the plurality of vectors into an execution embedding. The device may cluster the plurality of execution embeddings into a plurality of clusters and select representative execution intervals of the program based on the plurality of clusters for execution.
    Type: Application
    Filed: October 28, 2022
    Publication date: December 7, 2023
    Inventors: Yuanwei FANG, Jian CHEN, Yen-Kuang CHEN, Yuan XIE
  • Patent number: 11704271
    Abstract: A system-in-package architecture in accordance with aspects includes a logic die and one or more memory dice coupled together in a three-dimensional slack. The logic die can include one or more global building blocks and a plurality of local building blocks. The number of local building blocks can be scalable. The local building blocks can include a plurality of engines and memory controllers. The memory controllers can be configured to directly couple one or more of the engines to the one or more memory dice. The number and type of local building blocks, and the number and types of engines and memory controllers can be scalable.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: July 18, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Lide Duan, Wei Han, Yuhao Wang, Fei Xue, Yuanwei Fang, Hongzhong Zheng
  • Patent number: 11500811
    Abstract: The present disclosure relates to a method and an apparatus for map reduce. In some embodiments, an exemplary processing unit includes: a 2-dimensional (2D) processing element (PE) array comprising a plurality of PEs, each PE comprising a first input and a second input, the first inputs of the PEs in a linear array in a first dimension of the PE array being connected in series and the second inputs of the PEs in a linear array in a second dimension of the PE array being connected in parallel, each PE being configured to perform an operation on data from the first input or second input; and a plurality of reduce tree units, each reduce tree unit being coupled with the PEs in a linear array in the first dimension or the second dimension of the PE array and configured to perform a first reduction operation.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: November 15, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Yuanwei Fang, Tae Meon Bae, Sicheng Li, Minghai Qin, Guanlin Wu, Yen-kuang Chen
  • Patent number: 11445200
    Abstract: Embodiments of the disclosure provide systems and methods for processing video content. The method can include: receiving raw video data of a video; determining a texture complexity for the video based on the raw video data; determining an encoding mode for the raw video data based on the texture complexity; and encoding the raw video data using the determined encoding mode.
    Type: Grant
    Filed: May 12, 2020
    Date of Patent: September 13, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Minghai Qin, Guanlin Wu, Tae Meon Bae, Sicheng Li, Yuanwei Fang, Yen-kuang Chen
  • Patent number: 11442729
    Abstract: A method and system for processing a bit-packed array using one or more processors, including determining a data element size of the bit-packed array, determining a lane configuration of a single-instruction multiple-data (SIMD) unit for processing the bit-packed array based at least in part on the determined data element size, the lane configuration being determined from among a plurality of candidate lane configurations, each candidate lane configuration having a different number of vector register lanes and a corresponding bit capacity per vector register lane, configuring the SIMD unit according to the determined lane configuration, and loading one or more data elements into each vector register lane of the SIMD unit. SIMD instructions may be executed on the loaded one or more data elements of each vector register lane in parallel, and a result of the SIMD instruction may be stored in memory.
    Type: Grant
    Filed: October 26, 2020
    Date of Patent: September 13, 2022
    Assignee: Google LLC
    Inventors: Junwhan Ahn, Jichuan Chang, Andrew McCormick, Yuanwei Fang, Yixin Luo
  • Patent number: 11403090
    Abstract: This application describes methods, systems, and apparatus, including computer programs encoded on computer storage media, of an AI-assisted compiler. An example method includes obtaining intermediate code and executable code generated by compiling a computer program with a compiler; determining a reward based on one or more traces obtained by executing the executable code in a runtime system; generating an embedding vector based on the intermediate code and the one or more traces to represent code execution states; determining, using a reinforcement learning agent, one or more optimization actions based on the embedding vector and the reward; and updating the compiler by applying the one or more optimization actions.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: August 2, 2022
    Assignee: ALIBABA GROUP HOLDING LIMITED
    Inventors: Yuanwei Fang, Yen-kuang Chen
  • Publication number: 20220215241
    Abstract: This application describes methods, systems, and apparatus, including computer programs encoded on computer storage media, for microarchitecture-aware program sampling. An exemplary method includes receiving one or more traces collected from one or more microarchitectures executing a computer program for evaluating hardware configurations; training a machine learning (ML) model with multi-task learning based on the one or more traces as one or more training tasks; generating a plurality of embedded vectors representing the computer program; and updating, based on the trained ML model, the plurality of embedded vectors.
    Type: Application
    Filed: January 5, 2021
    Publication date: July 7, 2022
    Inventors: Yuanwei FANG, Minghai QIN, Yen-kuang CHEN
  • Publication number: 20220179635
    Abstract: This application describes methods, systems, and apparatus, including computer programs encoded on computer storage media, of an AI-assisted compiler. An example method includes obtaining intermediate code and executable code generated by compiling a computer program with a compiler; determining a reward based on one or more traces obtained by executing the executable code in a runtime system; generating an embedding vector based on the intermediate code and the one or more traces to represent code execution states; determining, using a reinforcement learning agent, one or more optimization actions based on the embedding vector and the reward; and updating the compiler by applying the one or more optimization actions.
    Type: Application
    Filed: December 8, 2020
    Publication date: June 9, 2022
    Inventors: Yuanwei FANG, Yen-kuang CHEN
  • Publication number: 20220129269
    Abstract: A method and system for processing a bit-packed array using one or more processors, including determining a data element size of the bit-packed array, determining a lane configuration of a single-instruction multiple-data (SIMD) unit for processing the bit-packed array based at least in part on the determined data element size, the lane configuration being determined from among a plurality of candidate lane configurations, each candidate lane configuration having a different number of vector register lanes and a corresponding bit capacity per vector register lane, configuring the SIMD unit according to the determined lane configuration, and loading one or more data elements into each vector register lane of the SIMD unit. SIMD instructions may be executed on the loaded one or more data elements of each vector register lane in parallel, and a result of the SIMD instruction may be stored in memory.
    Type: Application
    Filed: October 26, 2020
    Publication date: April 28, 2022
    Applicant: Google LLC
    Inventors: Junwhan Ahn, Jichuan Chang, Andrew McCormick, Yuanwei Fang, Yixin Luo
  • Publication number: 20220103831
    Abstract: The present disclosure relates to a method for scheduling computation resources for generating feature maps for video. The method comprises determining runtime for generating feature maps of a reference picture and a predicted picture, determining available computation resources for generating the feature maps, and allocating, based on the runtime, one or more computation resources among the available computation resources for generating the feature maps such that the feature maps are generated at regular time intervals.
    Type: Application
    Filed: September 30, 2020
    Publication date: March 31, 2022
    Inventors: Sicheng Ll, Yuanwei Fang, Minghai Qin, Yen-kuang Chen
  • Patent number: 11277626
    Abstract: Video coding techniques including differential bit rate or quality coding of one or more regions of interest and one or more non-regions of interest based on information including one or more of coordinates of the one or more regions of interest, a target complexity, residual encoder bit data, a requested quality, a difference between the current video data frame and a reconstructed video data frame, a target quality, a requested bit rate, frame target bit allocation and an as encoded bit rate.
    Type: Grant
    Filed: February 21, 2020
    Date of Patent: March 15, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Guanlin Wu, Minghai Qin, Tae Meon Bae, Sicheng Li, Yuanwei Fang, Yen-Kuang Chen
  • Publication number: 20220058024
    Abstract: A method of performing out-of-order execution in a processing system comprising a processing unit and one or more accelerators comprises dispatching a plurality of coarse-grained instructions, each instruction extended to comprise one or more tags, wherein each tag comprises dependency information for the respective instruction expressed at a coarse-grained level. The method also comprises translating the plurality of coarse-grained instructions into a plurality of fine-grained instructions, wherein the dependency information is translated into dependencies expressed at a fine-grained level. Further, the method comprises resolving the dependencies at the fine-grained level and scheduling the plurality of fine-grained instructions for execution across the one or more accelerators in the processing system.
    Type: Application
    Filed: August 18, 2020
    Publication date: February 24, 2022
    Inventors: Yuanwei FANG, Fei SUN, Fei XUE, Yuejian XIE, Yuhao WANG, Yen-Kuang CHEN
  • Publication number: 20220058150
    Abstract: A system-in-package architecture in accordance with aspects includes a logic die and one or more memory dice coupled together in a three-dimensional slack. The logic die can include one or more global building blocks and a plurality of local building blocks. The number of local building blocks can be scalable. The local building blocks can include a plurality of engines and memory controllers. The memory controllers can be configured to directly couple one or more of the engines to the one or more memory dice. The number and type of local building blocks, and the number and types of engines and memory controllers can be scalable.
    Type: Application
    Filed: August 20, 2020
    Publication date: February 24, 2022
    Inventors: Lide DUAN, Wei HAN, Yuhao WANG, Fei XUE, Yuanwei FANG, Hongzhong ZHENG
  • Publication number: 20220021888
    Abstract: Video coding techniques including variable bitrate encoding based on regions-of-interest (ROIs) and the type of the video content, the type of sets of frames of the video content, the type of scenes of the video content, or the like.
    Type: Application
    Filed: July 16, 2020
    Publication date: January 20, 2022
    Inventors: Minghai QIN, Yen-kuang CHEN, Tae Meon BAE, Guanlin WU, Yuanwei FANG, Sicheng LI
  • Publication number: 20210390076
    Abstract: The present disclosure relates to a method and an apparatus for map reduce. In some embodiments, an exemplary processing unit includes: a 2-dimensional (2D) processing element (PE) array comprising a plurality of PEs, each PE comprising a first input and a second input, the first inputs of the PEs in a linear array in a first dimension of the PE array being connected in series and the second inputs of the PEs in a linear array in a second dimension of the PE array being connected in parallel, each PE being configured to perform an operation on data from the first input or second input; and a plurality of reduce tree units, each reduce tree unit being coupled with the PEs in a linear array in the first dimension or the second dimension of the PE array and configured to perform a first reduction operation.
    Type: Application
    Filed: June 12, 2020
    Publication date: December 16, 2021
    Inventors: Yuanwei FANG, Tae Meon BAE, Sicheng LI, Minghai QIN, Guanlin WU, Yen-kuang CHEN
  • Publication number: 20210382871
    Abstract: Embodiments of the disclosure provide devices and methods for performing a top-k function. The device can include: a memory comprising a plurality of register files for storing the data elements, the plurality of register files comprising a parent register file and a first child register file associated with the parent register file, wherein the parent register file is associated with: first interface circuitry configured for reading a first parent data element from the parent register file and receiving a first child data element and a second child data element from the first child register file; and first comparison circuitry configured for updating the parent register file and the first child register file based on the first parent data element, the first child data element, and the second child data element according to a given principle.
    Type: Application
    Filed: June 4, 2020
    Publication date: December 9, 2021
    Inventors: Fei SUN, Shuangchen LI, Dimin NIU, Fei XUE, Yuanwei FANG
  • Publication number: 20210360258
    Abstract: Embodiments of the disclosure provide systems and methods for processing video content. The method can include: receiving raw video data of a video; determining a texture complexity for the video based on the raw video data; determining an encoding mode for the raw video data based on the texture complexity; and encoding the raw video data using the determined encoding mode.
    Type: Application
    Filed: May 12, 2020
    Publication date: November 18, 2021
    Inventors: Minghai QIN, Guanlin WU, Tae Meon BAE, Sicheng LI, Yuanwei FANG, Yen-kuang CHEN
  • Publication number: 20210266570
    Abstract: Video coding techniques including differential bit rate or quality coding of one or more regions of interest and one or more non-regions of interest based on information including one or more of coordinates of the one or more regions of interest, a target complexity, residual encoder bit data, a requested quality, a difference between the current video data frame and a reconstructed video data frame, a target quality, a requested bit rate, frame target bit allocation and an as encoded bit rate.
    Type: Application
    Filed: February 21, 2020
    Publication date: August 26, 2021
    Inventors: Guanlin WU, Minghai QIN, Tae Meon BAE, Sicheng LI, Yuanwei FANG, Yen-Kuang CHEN
  • Publication number: 20180124018
    Abstract: Aspects may relate to a server comprising: an interface to receive a service request; and a processor coupled to the interface to receive the service request, the processor configured to: implement a firewall appliance for the service request; operate a first micro-security application to generate an anomaly alert for the service request; and operate a second micro-security application to receive the anomaly alert from the first micro-security application or from another server's micro-security application and to determine whether the service request corresponds to a non-benign behavior.
    Type: Application
    Filed: December 22, 2016
    Publication date: May 3, 2018
    Inventors: Gheorghe Cascaval, Hui Chao, Mihai Christodorescu, Drew Dean, Dinakar Khurjati, Shuhua Ge, Hilmi Gunes Kayacik, Arun Raman, Ahmet Salih Buyukkayhan, Yuanwei Fang