Patents by Inventor Shuai Mu
Shuai Mu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12164884Abstract: Examples described herein relate to instructions to request performance of tanh and sigmoid instructions. For example, a compiler can generate native tanh instructions to perform tanh. In some examples, a tanh function can be compiled into instructions that include an instruction to perform either tanh(input) or tanh(input)/input depending on a value of the input to generate an intermediate output; an instruction to cause a performance of generation of scale factor based on the input; and an instruction to cause performance of a multiplication operation on the intermediate result with the scale factor. For example, a sigmoid function can be compiled to cause a math pipeline to perform a range check and performs operations based on a range.Type: GrantFiled: August 26, 2020Date of Patent: December 10, 2024Assignee: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
-
Publication number: 20240403044Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.Type: ApplicationFiled: May 29, 2024Publication date: December 5, 2024Applicant: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
-
Patent number: 12067394Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.Type: GrantFiled: February 17, 2023Date of Patent: August 20, 2024Assignee: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
-
Publication number: 20240256274Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.Type: ApplicationFiled: March 27, 2024Publication date: August 1, 2024Applicant: Intel CorporationInventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
-
Patent number: 12039766Abstract: The present disclosure provides an image processing method, apparatus, device, and computer-readable storage medium. The method includes: obtaining an image dataset, the image dataset including an image and an accompanying text related to an unseen class in the image; and generating a probability and/or distribution of the unseen class using an unseen class obtaining model, the probability and/or distribution of the unseen class including a probability that each pixel in the image is from the unseen class, a probability that the unseen class is present in the image, and a regional probability after the image is subdivided into a plurality of regions.Type: GrantFiled: April 15, 2021Date of Patent: July 16, 2024Assignees: BOE TECHNOLOGY GROUP CO., LTD., PEKING UNIVERSITYInventors: Jie Feng, Yadong Mu, Shuai Wang, Guiyu Tian, Yiming Bai, Xiangye Wei, Ge Ou, Qiong Wu
-
Publication number: 20240103810Abstract: An apparatus to facilitate supporting vector multiply add with double accumulator access in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a matrix multiplication operation, wherein the operands comprising two source matrices to be multiplied as part of the matrix multiplication operation; and issue a multiply and add vector (MADV) instruction for the multiplication operation utilizing a double accumulator access output, wherein the MADV instruction to multiply two vectors of the two source matrices in a single floating point (FP) pipeline of the processor.Type: ApplicationFiled: September 27, 2022Publication date: March 28, 2024Applicant: Intel CorporationInventors: Jiasheng Chen, Supratim Pal, Changwon Rhee, Hong Jiang, Kevin Hurd, Shuai Mu
-
Publication number: 20230315447Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.Type: ApplicationFiled: February 17, 2023Publication date: October 5, 2023Applicant: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
-
Patent number: 11625244Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.Type: GrantFiled: June 22, 2021Date of Patent: April 11, 2023Assignee: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
-
Publication number: 20220413916Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
-
Publication number: 20220405096Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.Type: ApplicationFiled: June 22, 2021Publication date: December 22, 2022Applicant: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
-
Publication number: 20220318013Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.Type: ApplicationFiled: March 25, 2021Publication date: October 6, 2022Applicant: Intel CorporationInventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
-
Publication number: 20220308877Abstract: A graphics processing apparatus includes a graphics processor and a constant cache. The graphics processor has a number of execution instances that will generate requests for constant data from the constant cache. The constant cache stores constants of multiple constant types. The constant cache has a single level of hierarchy to store the constant data. The constant cache has a banking structure based on the number of execution instances, where the execution instances generate requests for the constant data with unified messaging that is the same for the different types of constant data.Type: ApplicationFiled: March 26, 2021Publication date: September 29, 2022Inventors: Subramaniam MAIYURAN, Sudarshanram SHETTY, Travis SCHLUESSLER, Guei-Yuan LUEH, PingHang CHEUNG, Srividya KARUMURI, Chandra S. GURRAM, Shuai MU, Vikranth VEMULAPALLI
-
Patent number: 11281534Abstract: In various embodiments, methods and systems for implementing distributed data object management are provided. The distributed data object management system includes a distributed storage system having a local metadata-consensus information store in and one or more remote metadata-consensus information stores. A metadata-consensus information store is configured to store metadata-consensus information. The metadata-consensus information corresponds to erasure coded fragments of a data object and instruct on how to manage the erasure coded fragments. The distributed storage system further includes a local data store and one or more remote data stores for the erasure coded fragments. The distributed data object management system includes a distributed data object manager for operations including, interface operations, configuration operations, write operations, read operations, delete operations, garbage collection operations and failure recovery operations.Type: GrantFiled: April 23, 2019Date of Patent: March 22, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Cheng Huang, Jin Li, Aaron William Ogus, Douglas W. Phillips, Yu Lin Chen, Shuai Mu, Jinyang Li
-
Publication number: 20220066737Abstract: Examples described herein relate to instructions to request performance of tanh and sigmoid instructions. For example, a compiler can generate native tanh instructions to perform tanh. In some examples, a tanh function can be compiled into instructions that include an instruction to perform either tanh(input) or tanh(input)/input depending on a value of the input to generate an intermediate output; an instruction to cause a performance of generation of scale factor based on the input; and an instruction to cause performance of a multiplication operation on the intermediate result with the scale factor. For example, a sigmoid function can be compiled to cause a math pipeline to perform a range check and performs operations based on a range.Type: ApplicationFiled: August 26, 2020Publication date: March 3, 2022Inventors: Shuai MU, Cristina S. ANDERSON, Subramaniam MAIYURAN
-
Patent number: 11003532Abstract: In various embodiments, methods and systems for implementing distributed data object management are provided. The distributed data object management system includes a local metadata-consensus information store and one or more remote metadata-consensus information stores for metadata-consensus information and a local data store and one or more remote data stores for erasure coded fragments. For a write operation, corresponding metadata writes and data writes are performed in parallel using a metadata write path and a data write path, respectively, when writing to the local metadata-consensus information store and the one or more remote metadata-consensus information stores and the local data store and the one or more remote data stores. And, for a read operation, corresponding metadata reads and data reads are performed in parallel using a metadata read path and a data read path, respectively, when reading from the metadata-consensus information stores and the data stores.Type: GrantFiled: June 16, 2017Date of Patent: May 11, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Cheng Huang, Jin Li, Aaron William Ogus, Douglas W. Phillips, Yu Lin Chen, Shuai Mu, Jinyang Li
-
Patent number: 10993526Abstract: A support surface assembly includes a housing including a chamber and an opening. The support surface assembly also includes a support surface retractably coupled to the housing between a deployed position and a stored position. The support surface includes a plurality of slats pivotally coupled together, wherein each slat comprises a first end opening and an opposing second end opening. The support surface assembly also includes a plurality of rods associated with a corresponding slat of the plurality of slats. Each rod is positioned entirely within the first end opening of a first slat in the stored position and each rod is positioned partially within the second end opening of an adjacent second slat in the deployed position.Type: GrantFiled: January 8, 2019Date of Patent: May 4, 2021Assignee: THE BOEING COMPANYInventors: Cynthia A. Vandewall, Blake Lane, Craig G. Vogel, Elizabeth O'Hearn, Brian Keller, Shuai Mu, Chuyu Ruan, Swati Chopra
-
Publication number: 20200342993Abstract: A health monitoring system and method configured to monitor health of one or more individuals within an internal cabin of a vehicle include one or more group health monitoring devices associated with one or more attendants. The group health monitoring device(s) receive health signals including health data of the individual(s) within the internal cabin from one or more personal health assessment devices associated with the individual(s).Type: ApplicationFiled: April 24, 2019Publication date: October 29, 2020Applicant: THE BOEING COMPANYInventors: Cynthia A. Vandewall, Elizabeth A. O'Hearn, Brian Keller, Blake Lane, Craig M. Vogel, Shuai Mu, Chuyu Ruan, Swati D. Chopra
-
Publication number: 20200342992Abstract: A health monitoring system and method configured to monitor health of one or more individuals within an internal cabin of a vehicle include one or more personal health assessment devices associated with the one or more individuals. The personal health assessment device(s) obtain health data from the individual(s) and output health signals including the health data to one or more group health monitoring devices associated with one or more attendants who are responsible for taking care of the individual(s).Type: ApplicationFiled: April 24, 2019Publication date: October 29, 2020Applicant: THE BOEING COMPANYInventors: Cynthia A. Vandewall, Elizabeth A. O'Hearn, Brian Keller, Blake Lane, Craig M. Vogel, Shuai Mu, Chuyu Ruan, Swati D. Chopra
-
Publication number: 20200342994Abstract: A health monitoring system and method are configured to monitor health of individuals within an internal cabin of a vehicle, and include a health statistics database that stores group health data, and an inventory prediction control unit in communication with the health statistics database. The inventory prediction control unit analyzes the group health data stored in the health statistics database to predict future inventory for the vehicle.Type: ApplicationFiled: April 24, 2019Publication date: October 29, 2020Applicant: THE BOEING COMPANYInventors: Cynthia A. Vandewall, Elizabeth A. O'Hearn, Brian Keller, Blake Lane, Craig M. Vogel, Shuai Mu, Chuyu Ruan, Swati D. Chopra
-
Publication number: 20200214438Abstract: A support surface assembly includes a housing including a chamber and an opening. The support surface assembly also includes a support surface retractably coupled to the housing between a deployed position and a stored position. The support surface includes a plurality of slats pivotally coupled together, wherein each slat comprises a first end opening and an opposing second end opening. The support surface assembly also includes a plurality of rods associated with a corresponding slat of the plurality of slats. Each rod is positioned entirely within the first end opening of a first slat in the stored position and each rod is positioned partially within the second end opening of an adjacent second slat in the deployed position.Type: ApplicationFiled: January 8, 2019Publication date: July 9, 2020Inventors: Cynthia A. Vandewall, Blake Lane, Craig G. Vogel, Elizabeth O'Hearn, Brian Keller, Shuai Mu, Chuyu Ruan, Swati Chopra