Patents by Inventor Xiaoxin Fan
Xiaoxin Fan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11586601Abstract: The present disclosure relates to a method and an apparatus for representation of a sparse matrix in a neural network. In some embodiments, an exemplary operation unit includes a buffer for storing a representation of a sparse matrix in a neural network, a sparse engine communicatively coupled with the buffer, and a processing array communicatively coupled with the sparse engine. The sparse engine includes circuitry to: read the representation of the sparse matrix from the buffer, the representation comprising a first level bitmap, a second level bitmap, and an element array; decompress the first level bitmap to determine whether a block of the sparse matrix comprises a non-zero element; and in response to the block comprising a non-zero element, decompress the second level bitmap using the element array to obtain the block of the sparse matrix. The processing array includes circuitry to execute the neural network with the sparse matrix.Type: GrantFiled: February 5, 2020Date of Patent: February 21, 2023Assignee: Alibaba Group Holding LimitedInventors: Zhibin Xiao, Xiaoxin Fan, Minghai Qin
-
Patent number: 11500680Abstract: The present disclosure relates to an accelerator for systolic array-friendly data placement. The accelerator may include: a systolic array comprising a plurality of operation units, wherein the systolic array is configured to receive staged input data and perform operations using the staged input to generate staged output data, the staged output data comprising a number of segments; a controller configured to execute one or more instructions to generate a pattern generation signal; a data mask generator; and a memory configured to store the staged output data using the generated masks. The data mask generator may include circuitry configured to: receive the pattern generation signal from the controller, and, based on the received signal, generate a mask corresponding to each segment of the staged output data.Type: GrantFiled: April 24, 2020Date of Patent: November 15, 2022Assignee: Alibaba Group Holding LimitedInventors: Yuhao Wang, Xiaoxin Fan, Dimin Niu, Chunsheng Liu, Wei Han
-
Patent number: 11355163Abstract: The systems and methods are configured to efficiently and effectively include processing capabilities in memory. In one embodiment, a processing in memory (PIM) chip a memory array, logic components, and an interconnection network. The memory array is configured to store information. In one exemplary implementation the memory array includes storage cells and array periphery components. The logic components can be configured to process information stored in the memory array. The interconnection network is configured to communicatively couple the logic components. The interconnection network can include interconnect wires, and a portion of the interconnect wires are located in a metal layer area that is located above the memory array.Type: GrantFiled: September 29, 2020Date of Patent: June 7, 2022Assignee: Alibaba Group Holding LimitedInventors: Wei Han, Shuangchen Li, Lide Duan, Hongzhong Zheng, Dimin Niu, Yuhao Wang, Xiaoxin Fan
-
Publication number: 20220101887Abstract: The systems and methods are configured to efficiently and effectively include processing capabilities in memory. In one embodiment, a processing in memory (PIM) chip a memory array, logic components, and an interconnection network. The memory array is configured to store information. In one exemplary implementation the memory array includes storage cells and array periphery components. The logic components can be configured to process information stored in the memory array. The interconnection network is configured to communicatively couple the logic components. The interconnection network can include interconnect wires, and a portion of the interconnect wires are located in a metal layer area that is located above the memory array.Type: ApplicationFiled: September 29, 2020Publication date: March 31, 2022Inventors: Wei HAN, Shuangchen LI, Lide DUAN, Hongzhong ZHENG, Dimin NIU, Yuhao WANG, Xiaoxin FAN
-
Publication number: 20210334142Abstract: The present disclosure relates to an accelerator for systolic array-friendly data placement. The accelerator may include: a systolic array comprising a plurality of operation units, wherein the systolic array is configured to receive staged input data and perform operations using the staged input to generate staged output data, the staged output data comprising a number of segments; a controller configured to execute one or more instructions to generate a pattern generation signal; a data mask generator; and a memory configured to store the staged output data using the generated masks. The data mask generator may include circuitry configured to: receive the pattern generation signal from the controller, and, based on the received signal, generate a mask corresponding to each segment of the staged output data.Type: ApplicationFiled: April 24, 2020Publication date: October 28, 2021Inventors: Yuhao Wang, Xiaoxin Fan, Dimin Niu, Chunsheng Liu, Wei Han
-
Publication number: 20210319289Abstract: The present disclosure relates to systems and methods concerning a system including a host device and a convolutional neural network hardware accelerator. The hardware accelerator can be configured, at least in part by the host device, to generate activation data from spatial-domain input data and spatial-domain weight data using frequency-domain operations. The hardware accelerator can include one or more discrete Fourier transform units configured to generate a frequency-domain representation of the input data. The hardware accelerator can include a multiplication unit configured to generate a frequency-domain representation of the activation data by element-wise complex multiplication of the frequency-domain representation of the input data and a frequency-domain representation of the weight data.Type: ApplicationFiled: April 13, 2020Publication date: October 14, 2021Inventors: Wei HAN, Xiaoxin FAN, Yuhao WANG
-
Patent number: 11113063Abstract: According to one general aspect, an apparatus may include a main-branch target buffer (BTB). The apparatus may include a micro-BTB separate from and smaller than the main-BTB, and configured to produce prediction information associated with a branching instruction. The apparatus may include a micro-BTB confidence counter configured to measure a correctness of the prediction information produced by the micro-BTB. The apparatus may further include a micro-BTB misprediction rate counter configured to measure a rate of mispredictions produced by the micro-BTB. The apparatus may also include a micro-BTB enablement circuit configured to enable a usage of the micro-BTB's prediction information, based, at least in part, upon the values of the micro-BTB confidence counter and the micro-BTB misprediction rate counter.Type: GrantFiled: September 9, 2019Date of Patent: September 7, 2021Inventors: James David Dundas, Xiaoxin Fan, Shashank Nemawarkar, Madhu Saravana Sibi Govindan
-
Publication number: 20210240684Abstract: The present disclosure relates to a method and an apparatus for representation of a sparse matrix in a neural network. In some embodiments, an exemplary operation unit includes a buffer for storing a representation of a sparse matrix in a neural network, a sparse engine communicatively coupled with the buffer, and a processing array communicatively coupled with the sparse engine. The sparse engine includes circuitry to: read the representation of the sparse matrix from the buffer, the representation comprising a first level bitmap, a second level bitmap, and an element array; decompress the first level bitmap to determine whether a block of the sparse matrix comprises a non-zero element; and in response to the block comprising a non-zero element, decompress the second level bitmap using the element array to obtain the block of the sparse matrix. The processing array includes circuitry to execute the neural network with the sparse matrix.Type: ApplicationFiled: February 5, 2020Publication date: August 5, 2021Inventors: Zhibin XIAO, Xiaoxin FAN, Minghai QIN
-
Patent number: 11068200Abstract: Methods and systems are provided for improving memory control. A memory architecture includes a plurality of memory units and an interface. A respective memory unit of the plurality of memory units is configured with a Processing-In-Memory (PIM) architecture. The interface includes a plurality of lines. The interface is coupled between the plurality of memory units and a host. The interface is configured to receive one or more signals from a host via the plurality of lines. The respective memory unit of the plurality of memory units is coupled with a respective line of the plurality of lines, and the respective memory unit is further configured to receive a respective signal of the one or more signals via the interface so as to be individually selected by the host.Type: GrantFiled: November 27, 2019Date of Patent: July 20, 2021Assignee: Alibaba Group Holding LimitedInventors: Dimin Niu, Lide Duan, Yuhao Wang, Xiaoxin Fan, Zhibin Xiao
-
Publication number: 20210157647Abstract: Remote access latency in a non-uniform memory access (NUMA) system is substantially reduced by monitoring which NUMA nodes are accessing which local memories, and migrating memory pages from the local memory in a first NUMA node to the local memory in a hot NUMA node when the hot NUMA node is frequently accessing the local memory in the first NUMA node.Type: ApplicationFiled: April 30, 2020Publication date: May 27, 2021Inventors: Shasha WEN, Pengcheng LI, Xiaoxin FAN, Li ZHAO
-
Publication number: 20210157516Abstract: Methods and systems are provided for improving memory control. A memory architecture includes a plurality of memory units and an interface. A respective memory unit of the plurality of memory units is configured with a Processing-In-Memory (PIM) architecture. The interface includes a plurality of lines. The interface is coupled between the plurality of memory units and a host. The interface is configured to receive one or more signals from a host via the plurality of lines. The respective memory unit of the plurality of memory units is coupled with a respective line of the plurality of lines, and the respective memory unit is further configured to receive a respective signal of the one or more signals via the interface so as to be individually selected by the host.Type: ApplicationFiled: November 27, 2019Publication date: May 27, 2021Inventors: Dimin Niu, Lide Duan, Yuhao Wang, Xiaoxin Fan, Zhibin Xiao
-
Publication number: 20210142210Abstract: Methods and systems are provided for implementing training of learning models, including obtaining a pre-trained weight set for a learning model on a sample dataset and on a first loss function; selecting at least two tasks having heterogeneous features to be computed by a reference model; obtaining a reference dataset for the at least two tasks; designating a second loss function for feature embedding between the heterogeneous features of the at least two tasks; training the learning model on the first loss function and training the reference model on the second loss function, in turn; and updating the weight set based on a feature embedding learned by the learning model and a feature embedding learned by the reference model, in turn. Methods and systems of the present disclosure may alleviate computational overhead incurred by executing the learning model and loading different weight sets at a central network of the model.Type: ApplicationFiled: November 11, 2019Publication date: May 13, 2021Inventors: Chao Cheng, Xiaoxin Fan, Minghai Qin, Yuan Xie
-
Publication number: 20200401409Abstract: According to one general aspect, an apparatus may include a main-branch target buffer (BTB). The apparatus may include a micro-BTB separate from and smaller than the main-BTB, and configured to produce prediction information associated with a branching instruction. The apparatus may include a micro-BTB confidence counter configured to measure a correctness of the prediction information produced by the micro-BTB. The apparatus may further include a micro-BTB misprediction rate counter configured to measure a rate of mispredictions produced by the micro-BTB. The apparatus may also include a micro-BTB enablement circuit configured to enable a usage of the micro-BTB's prediction information, based, at least in part, upon the values of the micro-BTB confidence counter and the micro-BTB misprediction rate counter.Type: ApplicationFiled: September 9, 2019Publication date: December 24, 2020Inventors: James David DUNDAS, Xiaoxin FAN, Shashank NEMAWARKAR, Madhu Saravana Sibi GOVINDAN
-
Patent number: 9857421Abstract: Aspects of the invention relate to techniques for fault diagnosis based on dynamic circuit design partitioning. According to various implementations of the invention, a sub-circuit is extracted from a circuit design based on failure information of one or more integrated circuit devices. The extraction process may comprise combining fan-in cones of failing observation points included in the failure information. The extraction process may further comprise adding fan-in cones of one or more passing observation points to the combined fan-in cones of the failing observation points. Clock information of test patterns and/or layout information of the circuit design may be extracted and used in the sub-circuit extraction process. The extracted sub-circuit may then be used for diagnosing the one or more integrated circuit devices.Type: GrantFiled: May 4, 2016Date of Patent: January 2, 2018Assignee: Mentor Graphics CorporationInventors: Huaxing Tang, Yu Huang, Wu-Tung Cheng, Robert B. Benware, Xiaoxin Fan
-
Publication number: 20160245866Abstract: Aspects of the invention relate to techniques for fault diagnosis based on dynamic circuit design partitioning. According to various implementations of the invention, a sub-circuit is extracted from a circuit design based on failure information of one or more integrated circuit devices. The extraction process may comprise combining fan-in cones of failing observation points included in the failure information. The extraction process may further comprise adding fan-in cones of one or more passing observation points to the combined fan-in cones of the failing observation points. Clock information of test patterns and/or layout information of the circuit design may be extracted and used in the sub-circuit extraction process. The extracted sub-circuit may then be used for diagnosing the one or more integrated circuit devices.Type: ApplicationFiled: May 4, 2016Publication date: August 25, 2016Applicant: Mentor Graphics CorporationInventors: Huaxing Tang, Yu Huang, Wu-Tung Cheng, Robert B. Benware, Xiaoxin Fan
-
Patent number: 9336107Abstract: Aspects of the invention relate to techniques for fault diagnosis based on dynamic circuit design partitioning. According to various implementations of the invention, a sub-circuit is extracted from a circuit design based on failure information of one or more integrated circuit devices. The extraction process may comprise combining fan-in cones of failing observation points included in the failure information. The extraction process may further comprise adding fan-in cones of one or more passing observation points to the combined fan-in cones of the failing observation points. Clock information of test patterns and/or layout information of the circuit design may be extracted and used in the sub-circuit extraction process. The extracted sub-circuit may then be used for diagnosing the one or more integrated circuit devices.Type: GrantFiled: November 19, 2012Date of Patent: May 10, 2016Assignee: Mentor Graphics CorporationInventors: Huaxing Tang, Yu Huang, Wu-Tung Cheng, Robert Brady Benware, Xiaoxin Fan
-
Patent number: 9244125Abstract: Aspects of the invention relate to techniques for chain fault diagnosis based on dynamic circuit design partitioning. Fan-out cones for scan cells of one or more faulty scan chains of a circuit design are determined and combined to derive a forward-tracing cone. Fan-in cones for scan cells of the one or more faulty scan chains and for failing observation points of the circuit design are determined and combined to derive a backward-tracing cone. By determining intersection of the forward-tracing cone and the backward-tracing cone, a chain diagnosis sub-circuit for the test failure file is generated. Using the process, a plurality of chain diagnosis sub-circuits may be generated for a plurality of test failure files. Scan chain fault diagnosis may then be performed on the plurality of chain diagnosis sub-circuits with a plurality of computers.Type: GrantFiled: October 25, 2013Date of Patent: January 26, 2016Assignee: Mentor Graphics CorporationInventors: Yu Huang, Huaxing Tang, Wu-Tung Cheng, Robert Brady Benware, Manish Sharma, Xiaoxin Fan
-
Publication number: 20140164859Abstract: Aspects of the invention relate to techniques for chain fault diagnosis based on dynamic circuit design partitioning. Fan-out cones for scan cells of one or more faulty scan chains of a circuit design are determined and combined to derive a forward-tracing cone. Fan-in cones for scan cells of the one or more faulty scan chains and for failing observation points of the circuit design are determined and combined to derive a backward-tracing cone. By determining intersection of the forward-tracing cone and the backward-tracing cone, a chain diagnosis sub-circuit for the test failure file is generated. Using the process, a plurality of chain diagnosis sub-circuits may be generated for a plurality of test failure files. Scan chain fault diagnosis may then be performed on the plurality of chain diagnosis sub-circuits with a plurality of computers.Type: ApplicationFiled: October 25, 2013Publication date: June 12, 2014Applicant: Mentor Graphics CorporationInventors: Yu Huang, Huaxing Tang, Wu-Tung Cheng, Robert Brady Benware, Manish Sharma, Xiaoxin Fan
-
Patent number: 8707232Abstract: Aspects of the invention relate to techniques for fault diagnosis based on circuit design partitioning. According to various implementations of the invention, a circuit design of a failing die is first partitioned into a plurality of sub-circuits. The sub-circuits may be formed based on fan-in cones of observation points. Shared gate ratios may be used as a metric for adding fan-in cones of observation points into a sub-circuit. Based on test patterns and the sub-circuits, sub-circuit test patterns are determined. Fault diagnosis is then performed on the sub-circuits. The sub-circuit fault diagnosis comprises extracting sub-circuit failure information from the failure information for the failing die. The sub-circuit fault diagnosis may employ fault-free values for boundary gates in the sub-circuits.Type: GrantFiled: June 8, 2012Date of Patent: April 22, 2014Assignee: Mentor Graphics CorporationInventors: Huaxing Tang, Wu-Tung J. Cheng, Robert Brady Benware, Xiaoxin Fan
-
Publication number: 20130024830Abstract: Aspects of the invention relate to techniques for fault diagnosis based on circuit design partitioning. According to various implementations of the invention, a circuit design of a failing die is first partitioned into a plurality of sub-circuits. The sub-circuits may be formed based on fan-in cones of observation points. Shared gate ratios may be used as a metric for adding fan-in cones of observation points into a sub-circuit. Based on test patterns and the sub-circuits, sub-circuit test patterns are determined. Fault diagnosis is then performed on the sub-circuits. The sub-circuit fault diagnosis comprises extracting sub-circuit failure information from the failure information for the failing die. The sub-circuit fault diagnosis may employ fault-free values for boundary gates in the sub-circuits.Type: ApplicationFiled: June 8, 2012Publication date: January 24, 2013Inventors: Huaxing Tang, Wu-Tung J Cheng, Robert Brady Benware, Xiaoxin Fan