Patents by Inventor Shuai Mu

Shuai Mu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12164884
    Abstract: Examples described herein relate to instructions to request performance of tanh and sigmoid instructions. For example, a compiler can generate native tanh instructions to perform tanh. In some examples, a tanh function can be compiled into instructions that include an instruction to perform either tanh(input) or tanh(input)/input depending on a value of the input to generate an intermediate output; an instruction to cause a performance of generation of scale factor based on the input; and an instruction to cause performance of a multiplication operation on the intermediate result with the scale factor. For example, a sigmoid function can be compiled to cause a math pipeline to perform a range check and performs operations based on a range.
    Type: Grant
    Filed: August 26, 2020
    Date of Patent: December 10, 2024
    Assignee: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Publication number: 20240403044
    Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.
    Type: Application
    Filed: May 29, 2024
    Publication date: December 5, 2024
    Applicant: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Patent number: 12067394
    Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.
    Type: Grant
    Filed: February 17, 2023
    Date of Patent: August 20, 2024
    Assignee: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Publication number: 20240256274
    Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.
    Type: Application
    Filed: March 27, 2024
    Publication date: August 1, 2024
    Applicant: Intel Corporation
    Inventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
  • Patent number: 12039766
    Abstract: The present disclosure provides an image processing method, apparatus, device, and computer-readable storage medium. The method includes: obtaining an image dataset, the image dataset including an image and an accompanying text related to an unseen class in the image; and generating a probability and/or distribution of the unseen class using an unseen class obtaining model, the probability and/or distribution of the unseen class including a probability that each pixel in the image is from the unseen class, a probability that the unseen class is present in the image, and a regional probability after the image is subdivided into a plurality of regions.
    Type: Grant
    Filed: April 15, 2021
    Date of Patent: July 16, 2024
    Assignees: BOE TECHNOLOGY GROUP CO., LTD., PEKING UNIVERSITY
    Inventors: Jie Feng, Yadong Mu, Shuai Wang, Guiyu Tian, Yiming Bai, Xiangye Wei, Ge Ou, Qiong Wu
  • Publication number: 20240103810
    Abstract: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.
    Type: Application
    Filed: September 27, 2022
    Publication date: March 28, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Supratim Pal, Changwon Rhee, Hong Jiang, Kevin Hurd, Shuai Mu
  • Publication number: 20230315447
    Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.
    Type: Application
    Filed: February 17, 2023
    Publication date: October 5, 2023
    Applicant: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Patent number: 11625244
    Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.
    Type: Grant
    Filed: June 22, 2021
    Date of Patent: April 11, 2023
    Assignee: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Publication number: 20220413916
    Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
  • Publication number: 20220405096
    Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.
    Type: Application
    Filed: June 22, 2021
    Publication date: December 22, 2022
    Applicant: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Publication number: 20220318013
    Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.
    Type: Application
    Filed: March 25, 2021
    Publication date: October 6, 2022
    Applicant: Intel Corporation
    Inventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
  • Publication number: 20220308877
    Abstract: A graphics processing apparatus includes a graphics processor and a constant cache. The graphics processor has a number of execution instances that will generate requests for constant data from the constant cache. The constant cache stores constants of multiple constant types. The constant cache has a single level of hierarchy to store the constant data. The constant cache has a banking structure based on the number of execution instances, where the execution instances generate requests for the constant data with unified messaging that is the same for the different types of constant data.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Inventors: Subramaniam MAIYURAN, Sudarshanram SHETTY, Travis SCHLUESSLER, Guei-Yuan LUEH, PingHang CHEUNG, Srividya KARUMURI, Chandra S. GURRAM, Shuai MU, Vikranth VEMULAPALLI
  • Patent number: 11281534
    Abstract: In various embodiments, methods and systems for implementing distributed data object management are provided. The distributed data object management system includes a distributed storage system having a local metadata-consensus information store in and one or more remote metadata-consensus information stores. A metadata-consensus information store is configured to store metadata-consensus information. The metadata-consensus information corresponds to erasure coded fragments of a data object and instruct on how to manage the erasure coded fragments. The distributed storage system further includes a local data store and one or more remote data stores for the erasure coded fragments. The distributed data object management system includes a distributed data object manager for operations including, interface operations, configuration operations, write operations, read operations, delete operations, garbage collection operations and failure recovery operations.
    Type: Grant
    Filed: April 23, 2019
    Date of Patent: March 22, 2022
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Cheng Huang, Jin Li, Aaron William Ogus, Douglas W. Phillips, Yu Lin Chen, Shuai Mu, Jinyang Li
  • Publication number: 20220066737
    Abstract: Examples described herein relate to instructions to request performance of tanh and sigmoid instructions. For example, a compiler can generate native tanh instructions to perform tanh. In some examples, a tanh function can be compiled into instructions that include an instruction to perform either tanh(input) or tanh(input)/input depending on a value of the input to generate an intermediate output; an instruction to cause a performance of generation of scale factor based on the input; and an instruction to cause performance of a multiplication operation on the intermediate result with the scale factor. For example, a sigmoid function can be compiled to cause a math pipeline to perform a range check and performs operations based on a range.
    Type: Application
    Filed: August 26, 2020
    Publication date: March 3, 2022
    Inventors: Shuai MU, Cristina S. ANDERSON, Subramaniam MAIYURAN
  • Patent number: 11003532
    Abstract: In various embodiments, methods and systems for implementing distributed data object management are provided. The distributed data object management system includes a local metadata-consensus information store and one or more remote metadata-consensus information stores for metadata-consensus information and a local data store and one or more remote data stores for erasure coded fragments. For a write operation, corresponding metadata writes and data writes are performed in parallel using a metadata write path and a data write path, respectively, when writing to the local metadata-consensus information store and the one or more remote metadata-consensus information stores and the local data store and the one or more remote data stores. And, for a read operation, corresponding metadata reads and data reads are performed in parallel using a metadata read path and a data read path, respectively, when reading from the metadata-consensus information stores and the data stores.
    Type: Grant
    Filed: June 16, 2017
    Date of Patent: May 11, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Cheng Huang, Jin Li, Aaron William Ogus, Douglas W. Phillips, Yu Lin Chen, Shuai Mu, Jinyang Li
  • Patent number: 10993526
    Abstract: A support surface assembly includes a housing including a chamber and an opening. The support surface assembly also includes a support surface retractably coupled to the housing between a deployed position and a stored position. The support surface includes a plurality of slats pivotally coupled together, wherein each slat comprises a first end opening and an opposing second end opening. The support surface assembly also includes a plurality of rods associated with a corresponding slat of the plurality of slats. Each rod is positioned entirely within the first end opening of a first slat in the stored position and each rod is positioned partially within the second end opening of an adjacent second slat in the deployed position.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: May 4, 2021
    Assignee: THE BOEING COMPANY
    Inventors: Cynthia A. Vandewall, Blake Lane, Craig G. Vogel, Elizabeth O'Hearn, Brian Keller, Shuai Mu, Chuyu Ruan, Swati Chopra
  • Publication number: 20200342993
    Abstract: A health monitoring system and method configured to monitor health of one or more individuals within an internal cabin of a vehicle include one or more group health monitoring devices associated with one or more attendants. The group health monitoring device(s) receive health signals including health data of the individual(s) within the internal cabin from one or more personal health assessment devices associated with the individual(s).
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: THE BOEING COMPANY
    Inventors: Cynthia A. Vandewall, Elizabeth A. O'Hearn, Brian Keller, Blake Lane, Craig M. Vogel, Shuai Mu, Chuyu Ruan, Swati D. Chopra
  • Publication number: 20200342992
    Abstract: A health monitoring system and method configured to monitor health of one or more individuals within an internal cabin of a vehicle include one or more personal health assessment devices associated with the one or more individuals. The personal health assessment device(s) obtain health data from the individual(s) and output health signals including the health data to one or more group health monitoring devices associated with one or more attendants who are responsible for taking care of the individual(s).
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: THE BOEING COMPANY
    Inventors: Cynthia A. Vandewall, Elizabeth A. O'Hearn, Brian Keller, Blake Lane, Craig M. Vogel, Shuai Mu, Chuyu Ruan, Swati D. Chopra
  • Publication number: 20200342994
    Abstract: A health monitoring system and method are configured to monitor health of individuals within an internal cabin of a vehicle, and include a health statistics database that stores group health data, and an inventory prediction control unit in communication with the health statistics database. The inventory prediction control unit analyzes the group health data stored in the health statistics database to predict future inventory for the vehicle.
    Type: Application
    Filed: April 24, 2019
    Publication date: October 29, 2020
    Applicant: THE BOEING COMPANY
    Inventors: Cynthia A. Vandewall, Elizabeth A. O'Hearn, Brian Keller, Blake Lane, Craig M. Vogel, Shuai Mu, Chuyu Ruan, Swati D. Chopra
  • Publication number: 20200214438
    Abstract: A support surface assembly includes a housing including a chamber and an opening. The support surface assembly also includes a support surface retractably coupled to the housing between a deployed position and a stored position. The support surface includes a plurality of slats pivotally coupled together, wherein each slat comprises a first end opening and an opposing second end opening. The support surface assembly also includes a plurality of rods associated with a corresponding slat of the plurality of slats. Each rod is positioned entirely within the first end opening of a first slat in the stored position and each rod is positioned partially within the second end opening of an adjacent second slat in the deployed position.
    Type: Application
    Filed: January 8, 2019
    Publication date: July 9, 2020
    Inventors: Cynthia A. Vandewall, Blake Lane, Craig G. Vogel, Elizabeth O'Hearn, Brian Keller, Shuai Mu, Chuyu Ruan, Swati Chopra