Patents by Inventor Yifan XIONG

Yifan XIONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MIXTURE-OF-EXPERTS LAYER WITH DYNAMIC GATING

Publication number: 20240169463

Abstract: A computing system including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The processing devices are configured to, in each of a plurality of iterations, at each of the processing devices, receive a respective plurality of input tokens. Executing the MoE layer further includes, at each of the processing devices, selecting one or more destination expert sub-models associated with the input tokens. Respective numbers k of expert sub-models selected differ across the iterations. At each of the processing devices, executing the MoE layer further includes conveying the input tokens to the one or more destination expert sub-models. Executing the MoE layer further includes generating one or more respective expert sub-model outputs at the one or more destination expert sub-models. Executing the MoE layer further includes generating and outputting an MoE layer output based on the one or more expert sub-model outputs.

Type: Application

Filed: November 10, 2022

Publication date: May 23, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yifan XIONG, Changho HWANG, Wei CUI, Ziyue YANG, Ze LIU, Han HU, Zilong WANG, Rafael Omar SALAS, Jithin JOSE, Prabhat RAM, Ho-Yuen CHAU, Peng CHENG, Fan YANG, Mao YANG, Yongqiang XIONG
MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES

Publication number: 20240160894

Abstract: A computing system is provided, including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The MoE layer includes a plurality of expert sub-models that each have a respective plurality of parameter values. The MoE layer is configured to be switchable between a data parallel mode and an expert-data-model parallel mode without conveying the respective parameter values of the expert sub-models among the plurality of processing devices.

Type: Application

Filed: November 10, 2022

Publication date: May 16, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yifan XIONG, Changho HWANG, Wei CUI, Ziyue YANG, Ze LIU, Han HU, Zilong WANG, Rafael Omar SALAS, Jithin JOSE, Prabhat RAM, Ho-Yuen CHAU, Peng CHENG, Fan YANG, Mao YANG, Yongqiang XIONG
COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER

Publication number: 20240160906

Abstract: A computing system including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer included in an MoE model. The processing devices are configured to execute the MoE layer at least in part by, during a first collective communication phase between the processing devices, splitting each of a plurality of first input tensors along a first dimension to obtain first output tensors. Executing the MoE layer further includes processing the first output tensors at a respective a plurality of expert sub-models to obtain a plurality of second input tensors. Executing the MoE layer further includes, during a second collective communication phase between the processing devices, receiving the second input tensors from the expert sub-models and concatenating the second input tensors along the first dimension to obtain second output tensors. Executing the MoE layer further includes outputting the second output tensors as output of the MoE layer.

Type: Application

Filed: November 10, 2022

Publication date: May 16, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yifan XIONG, Changho HWANG, Wei CUI, Ziyue YANG, Ze LIU, Han HU, Zilong WANG, Rafael Omar SALAS, Jithin JOSE, Prabhat RAM, Ho-Yuen CHAU, Peng CHENG, Fan YANG, Mao YANG, Yongqiang XIONG
SPARSE ENCODING AND DECODING AT MIXTURE-OF-EXPERTS LAYER

Publication number: 20240086719

Abstract: A computing system including a plurality of processing devices configured to execute a Mixture-of-Experts (MoE) layer. The processing devices are configured to execute the MoE layer at least in part by receiving an input tensor including input tokens. Executing the MoE layer further includes computing a gating function output vector based on the input tensor and computing a sparse encoding of the input tensor and the gating function output vector. The sparse encoding indicates one or more destination expert sub-models. Executing the MoE layer further includes dispatching the input tensor for processing at the one or more destination expert sub-models, and further includes computing an expert output tensor. Executing the MoE layer further includes computing an MoE layer output at least in part by computing a sparse decoding of the expert output tensor. Executing the MoE layer further includes conveying the MoE layer output to an additional computing process.

Type: Application

Filed: May 16, 2023

Publication date: March 14, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yifan XIONG, Changho HWANG, Wei CUI, Ziyue YANG, Ze LIU, Han HU, Zilong WANG, Rafael Omar SALAS, Jithin JOSE, Prabhat RAM, Ho-Yuen CHAU, Peng CHENG, Fan YANG, Mao YANG, Yongqiang XIONG

MIXTURE-OF-EXPERTS LAYER WITH DYNAMIC GATING

MIXTURE-OF-EXPERTS LAYER WITH SWITCHABLE PARALLEL MODES

COLLECTIVE COMMUNICATION PHASES AT MIXTURE-OF-EXPERTS LAYER

SPARSE ENCODING AND DECODING AT MIXTURE-OF-EXPERTS LAYER