Patents by Inventor Xiaoxin Fan

Xiaoxin Fan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11586601
    Abstract: The present disclosure relates to a method and an apparatus for representation of a sparse matrix in a neural network. In some embodiments, an exemplary operation unit includes a buffer for storing a representation of a sparse matrix in a neural network, a sparse engine communicatively coupled with the buffer, and a processing array communicatively coupled with the sparse engine. The sparse engine includes circuitry to: read the representation of the sparse matrix from the buffer, the representation comprising a first level bitmap, a second level bitmap, and an element array; decompress the first level bitmap to determine whether a block of the sparse matrix comprises a non-zero element; and in response to the block comprising a non-zero element, decompress the second level bitmap using the element array to obtain the block of the sparse matrix. The processing array includes circuitry to execute the neural network with the sparse matrix.
    Type: Grant
    Filed: February 5, 2020
    Date of Patent: February 21, 2023
    Assignee: Alibaba Group Holding Limited
    Inventors: Zhibin Xiao, Xiaoxin Fan, Minghai Qin
  • Patent number: 11500680
    Abstract: The present disclosure relates to an accelerator for systolic array-friendly data placement. The accelerator may include: a systolic array comprising a plurality of operation units, wherein the systolic array is configured to receive staged input data and perform operations using the staged input to generate staged output data, the staged output data comprising a number of segments; a controller configured to execute one or more instructions to generate a pattern generation signal; a data mask generator; and a memory configured to store the staged output data using the generated masks. The data mask generator may include circuitry configured to: receive the pattern generation signal from the controller, and, based on the received signal, generate a mask corresponding to each segment of the staged output data.
    Type: Grant
    Filed: April 24, 2020
    Date of Patent: November 15, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Yuhao Wang, Xiaoxin Fan, Dimin Niu, Chunsheng Liu, Wei Han
  • Patent number: 11355163
    Abstract: The systems and methods are configured to efficiently and effectively include processing capabilities in memory. In one embodiment, a processing in memory (PIM) chip a memory array, logic components, and an interconnection network. The memory array is configured to store information. In one exemplary implementation the memory array includes storage cells and array periphery components. The logic components can be configured to process information stored in the memory array. The interconnection network is configured to communicatively couple the logic components. The interconnection network can include interconnect wires, and a portion of the interconnect wires are located in a metal layer area that is located above the memory array.
    Type: Grant
    Filed: September 29, 2020
    Date of Patent: June 7, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Wei Han, Shuangchen Li, Lide Duan, Hongzhong Zheng, Dimin Niu, Yuhao Wang, Xiaoxin Fan
  • Publication number: 20220101887
    Abstract: The systems and methods are configured to efficiently and effectively include processing capabilities in memory. In one embodiment, a processing in memory (PIM) chip a memory array, logic components, and an interconnection network. The memory array is configured to store information. In one exemplary implementation the memory array includes storage cells and array periphery components. The logic components can be configured to process information stored in the memory array. The interconnection network is configured to communicatively couple the logic components. The interconnection network can include interconnect wires, and a portion of the interconnect wires are located in a metal layer area that is located above the memory array.
    Type: Application
    Filed: September 29, 2020
    Publication date: March 31, 2022
    Inventors: Wei HAN, Shuangchen LI, Lide DUAN, Hongzhong ZHENG, Dimin NIU, Yuhao WANG, Xiaoxin FAN
  • Publication number: 20210334142
    Abstract: The present disclosure relates to an accelerator for systolic array-friendly data placement. The accelerator may include: a systolic array comprising a plurality of operation units, wherein the systolic array is configured to receive staged input data and perform operations using the staged input to generate staged output data, the staged output data comprising a number of segments; a controller configured to execute one or more instructions to generate a pattern generation signal; a data mask generator; and a memory configured to store the staged output data using the generated masks. The data mask generator may include circuitry configured to: receive the pattern generation signal from the controller, and, based on the received signal, generate a mask corresponding to each segment of the staged output data.
    Type: Application
    Filed: April 24, 2020
    Publication date: October 28, 2021
    Inventors: Yuhao Wang, Xiaoxin Fan, Dimin Niu, Chunsheng Liu, Wei Han
  • Publication number: 20210319289
    Abstract: The present disclosure relates to systems and methods concerning a system including a host device and a convolutional neural network hardware accelerator. The hardware accelerator can be configured, at least in part by the host device, to generate activation data from spatial-domain input data and spatial-domain weight data using frequency-domain operations. The hardware accelerator can include one or more discrete Fourier transform units configured to generate a frequency-domain representation of the input data. The hardware accelerator can include a multiplication unit configured to generate a frequency-domain representation of the activation data by element-wise complex multiplication of the frequency-domain representation of the input data and a frequency-domain representation of the weight data.
    Type: Application
    Filed: April 13, 2020
    Publication date: October 14, 2021
    Inventors: Wei HAN, Xiaoxin FAN, Yuhao WANG
  • Patent number: 11113063
    Abstract: According to one general aspect, an apparatus may include a main-branch target buffer (BTB). The apparatus may include a micro-BTB separate from and smaller than the main-BTB, and configured to produce prediction information associated with a branching instruction. The apparatus may include a micro-BTB confidence counter configured to measure a correctness of the prediction information produced by the micro-BTB. The apparatus may further include a micro-BTB misprediction rate counter configured to measure a rate of mispredictions produced by the micro-BTB. The apparatus may also include a micro-BTB enablement circuit configured to enable a usage of the micro-BTB's prediction information, based, at least in part, upon the values of the micro-BTB confidence counter and the micro-BTB misprediction rate counter.
    Type: Grant
    Filed: September 9, 2019
    Date of Patent: September 7, 2021
    Inventors: James David Dundas, Xiaoxin Fan, Shashank Nemawarkar, Madhu Saravana Sibi Govindan
  • Publication number: 20210240684
    Abstract: The present disclosure relates to a method and an apparatus for representation of a sparse matrix in a neural network. In some embodiments, an exemplary operation unit includes a buffer for storing a representation of a sparse matrix in a neural network, a sparse engine communicatively coupled with the buffer, and a processing array communicatively coupled with the sparse engine. The sparse engine includes circuitry to: read the representation of the sparse matrix from the buffer, the representation comprising a first level bitmap, a second level bitmap, and an element array; decompress the first level bitmap to determine whether a block of the sparse matrix comprises a non-zero element; and in response to the block comprising a non-zero element, decompress the second level bitmap using the element array to obtain the block of the sparse matrix. The processing array includes circuitry to execute the neural network with the sparse matrix.
    Type: Application
    Filed: February 5, 2020
    Publication date: August 5, 2021
    Inventors: Zhibin XIAO, Xiaoxin FAN, Minghai QIN
  • Patent number: 11068200
    Abstract: Methods and systems are provided for improving memory control. A memory architecture includes a plurality of memory units and an interface. A respective memory unit of the plurality of memory units is configured with a Processing-In-Memory (PIM) architecture. The interface includes a plurality of lines. The interface is coupled between the plurality of memory units and a host. The interface is configured to receive one or more signals from a host via the plurality of lines. The respective memory unit of the plurality of memory units is coupled with a respective line of the plurality of lines, and the respective memory unit is further configured to receive a respective signal of the one or more signals via the interface so as to be individually selected by the host.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: July 20, 2021
    Assignee: Alibaba Group Holding Limited
    Inventors: Dimin Niu, Lide Duan, Yuhao Wang, Xiaoxin Fan, Zhibin Xiao
  • Publication number: 20210157647
    Abstract: Remote access latency in a non-uniform memory access (NUMA) system is substantially reduced by monitoring which NUMA nodes are accessing which local memories, and migrating memory pages from the local memory in a first NUMA node to the local memory in a hot NUMA node when the hot NUMA node is frequently accessing the local memory in the first NUMA node.
    Type: Application
    Filed: April 30, 2020
    Publication date: May 27, 2021
    Inventors: Shasha WEN, Pengcheng LI, Xiaoxin FAN, Li ZHAO
  • Publication number: 20210157516
    Abstract: Methods and systems are provided for improving memory control. A memory architecture includes a plurality of memory units and an interface. A respective memory unit of the plurality of memory units is configured with a Processing-In-Memory (PIM) architecture. The interface includes a plurality of lines. The interface is coupled between the plurality of memory units and a host. The interface is configured to receive one or more signals from a host via the plurality of lines. The respective memory unit of the plurality of memory units is coupled with a respective line of the plurality of lines, and the respective memory unit is further configured to receive a respective signal of the one or more signals via the interface so as to be individually selected by the host.
    Type: Application
    Filed: November 27, 2019
    Publication date: May 27, 2021
    Inventors: Dimin Niu, Lide Duan, Yuhao Wang, Xiaoxin Fan, Zhibin Xiao
  • Publication number: 20210142210
    Abstract: Methods and systems are provided for implementing training of learning models, including obtaining a pre-trained weight set for a learning model on a sample dataset and on a first loss function; selecting at least two tasks having heterogeneous features to be computed by a reference model; obtaining a reference dataset for the at least two tasks; designating a second loss function for feature embedding between the heterogeneous features of the at least two tasks; training the learning model on the first loss function and training the reference model on the second loss function, in turn; and updating the weight set based on a feature embedding learned by the learning model and a feature embedding learned by the reference model, in turn. Methods and systems of the present disclosure may alleviate computational overhead incurred by executing the learning model and loading different weight sets at a central network of the model.
    Type: Application
    Filed: November 11, 2019
    Publication date: May 13, 2021
    Inventors: Chao Cheng, Xiaoxin Fan, Minghai Qin, Yuan Xie
  • Publication number: 20200401409
    Abstract: According to one general aspect, an apparatus may include a main-branch target buffer (BTB). The apparatus may include a micro-BTB separate from and smaller than the main-BTB, and configured to produce prediction information associated with a branching instruction. The apparatus may include a micro-BTB confidence counter configured to measure a correctness of the prediction information produced by the micro-BTB. The apparatus may further include a micro-BTB misprediction rate counter configured to measure a rate of mispredictions produced by the micro-BTB. The apparatus may also include a micro-BTB enablement circuit configured to enable a usage of the micro-BTB's prediction information, based, at least in part, upon the values of the micro-BTB confidence counter and the micro-BTB misprediction rate counter.
    Type: Application
    Filed: September 9, 2019
    Publication date: December 24, 2020
    Inventors: James David DUNDAS, Xiaoxin FAN, Shashank NEMAWARKAR, Madhu Saravana Sibi GOVINDAN
  • Patent number: 9857421
    Abstract: Aspects of the invention relate to techniques for fault diagnosis based on dynamic circuit design partitioning. According to various implementations of the invention, a sub-circuit is extracted from a circuit design based on failure information of one or more integrated circuit devices. The extraction process may comprise combining fan-in cones of failing observation points included in the failure information. The extraction process may further comprise adding fan-in cones of one or more passing observation points to the combined fan-in cones of the failing observation points. Clock information of test patterns and/or layout information of the circuit design may be extracted and used in the sub-circuit extraction process. The extracted sub-circuit may then be used for diagnosing the one or more integrated circuit devices.
    Type: Grant
    Filed: May 4, 2016
    Date of Patent: January 2, 2018
    Assignee: Mentor Graphics Corporation
    Inventors: Huaxing Tang, Yu Huang, Wu-Tung Cheng, Robert B. Benware, Xiaoxin Fan
  • Publication number: 20160245866
    Abstract: Aspects of the invention relate to techniques for fault diagnosis based on dynamic circuit design partitioning. According to various implementations of the invention, a sub-circuit is extracted from a circuit design based on failure information of one or more integrated circuit devices. The extraction process may comprise combining fan-in cones of failing observation points included in the failure information. The extraction process may further comprise adding fan-in cones of one or more passing observation points to the combined fan-in cones of the failing observation points. Clock information of test patterns and/or layout information of the circuit design may be extracted and used in the sub-circuit extraction process. The extracted sub-circuit may then be used for diagnosing the one or more integrated circuit devices.
    Type: Application
    Filed: May 4, 2016
    Publication date: August 25, 2016
    Applicant: Mentor Graphics Corporation
    Inventors: Huaxing Tang, Yu Huang, Wu-Tung Cheng, Robert B. Benware, Xiaoxin Fan
  • Patent number: 9336107
    Abstract: Aspects of the invention relate to techniques for fault diagnosis based on dynamic circuit design partitioning. According to various implementations of the invention, a sub-circuit is extracted from a circuit design based on failure information of one or more integrated circuit devices. The extraction process may comprise combining fan-in cones of failing observation points included in the failure information. The extraction process may further comprise adding fan-in cones of one or more passing observation points to the combined fan-in cones of the failing observation points. Clock information of test patterns and/or layout information of the circuit design may be extracted and used in the sub-circuit extraction process. The extracted sub-circuit may then be used for diagnosing the one or more integrated circuit devices.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: May 10, 2016
    Assignee: Mentor Graphics Corporation
    Inventors: Huaxing Tang, Yu Huang, Wu-Tung Cheng, Robert Brady Benware, Xiaoxin Fan
  • Patent number: 9244125
    Abstract: Aspects of the invention relate to techniques for chain fault diagnosis based on dynamic circuit design partitioning. Fan-out cones for scan cells of one or more faulty scan chains of a circuit design are determined and combined to derive a forward-tracing cone. Fan-in cones for scan cells of the one or more faulty scan chains and for failing observation points of the circuit design are determined and combined to derive a backward-tracing cone. By determining intersection of the forward-tracing cone and the backward-tracing cone, a chain diagnosis sub-circuit for the test failure file is generated. Using the process, a plurality of chain diagnosis sub-circuits may be generated for a plurality of test failure files. Scan chain fault diagnosis may then be performed on the plurality of chain diagnosis sub-circuits with a plurality of computers.
    Type: Grant
    Filed: October 25, 2013
    Date of Patent: January 26, 2016
    Assignee: Mentor Graphics Corporation
    Inventors: Yu Huang, Huaxing Tang, Wu-Tung Cheng, Robert Brady Benware, Manish Sharma, Xiaoxin Fan
  • Publication number: 20140164859
    Abstract: Aspects of the invention relate to techniques for chain fault diagnosis based on dynamic circuit design partitioning. Fan-out cones for scan cells of one or more faulty scan chains of a circuit design are determined and combined to derive a forward-tracing cone. Fan-in cones for scan cells of the one or more faulty scan chains and for failing observation points of the circuit design are determined and combined to derive a backward-tracing cone. By determining intersection of the forward-tracing cone and the backward-tracing cone, a chain diagnosis sub-circuit for the test failure file is generated. Using the process, a plurality of chain diagnosis sub-circuits may be generated for a plurality of test failure files. Scan chain fault diagnosis may then be performed on the plurality of chain diagnosis sub-circuits with a plurality of computers.
    Type: Application
    Filed: October 25, 2013
    Publication date: June 12, 2014
    Applicant: Mentor Graphics Corporation
    Inventors: Yu Huang, Huaxing Tang, Wu-Tung Cheng, Robert Brady Benware, Manish Sharma, Xiaoxin Fan
  • Patent number: 8707232
    Abstract: Aspects of the invention relate to techniques for fault diagnosis based on circuit design partitioning. According to various implementations of the invention, a circuit design of a failing die is first partitioned into a plurality of sub-circuits. The sub-circuits may be formed based on fan-in cones of observation points. Shared gate ratios may be used as a metric for adding fan-in cones of observation points into a sub-circuit. Based on test patterns and the sub-circuits, sub-circuit test patterns are determined. Fault diagnosis is then performed on the sub-circuits. The sub-circuit fault diagnosis comprises extracting sub-circuit failure information from the failure information for the failing die. The sub-circuit fault diagnosis may employ fault-free values for boundary gates in the sub-circuits.
    Type: Grant
    Filed: June 8, 2012
    Date of Patent: April 22, 2014
    Assignee: Mentor Graphics Corporation
    Inventors: Huaxing Tang, Wu-Tung J. Cheng, Robert Brady Benware, Xiaoxin Fan
  • Publication number: 20130024830
    Abstract: Aspects of the invention relate to techniques for fault diagnosis based on circuit design partitioning. According to various implementations of the invention, a circuit design of a failing die is first partitioned into a plurality of sub-circuits. The sub-circuits may be formed based on fan-in cones of observation points. Shared gate ratios may be used as a metric for adding fan-in cones of observation points into a sub-circuit. Based on test patterns and the sub-circuits, sub-circuit test patterns are determined. Fault diagnosis is then performed on the sub-circuits. The sub-circuit fault diagnosis comprises extracting sub-circuit failure information from the failure information for the failing die. The sub-circuit fault diagnosis may employ fault-free values for boundary gates in the sub-circuits.
    Type: Application
    Filed: June 8, 2012
    Publication date: January 24, 2013
    Inventors: Huaxing Tang, Wu-Tung J Cheng, Robert Brady Benware, Xiaoxin Fan