Patents by Inventor Guokai Ma

Guokai Ma has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240037378
    Abstract: Systems, apparatuses and methods may provide for technology that identifies an embedding table associated with a neural network. The neural network is associated with a plurality of compute nodes. The technology further identifies a number of entries of the embedding table, and determines whether to process gradients associated with the embedding table as dense gradients or sparse gradients based on the number of entries.
    Type: Application
    Filed: December 24, 2020
    Publication date: February 1, 2024
    Applicant: Intel Corporation
    Inventors: Guokai Ma, Jiong Gong, Dhiraj Kalamkar, Rachitha Prem Seelin, Hongzhen Liu, Akshay Jain, Liangang Zhang
  • Publication number: 20230315654
    Abstract: A method of performing ring allreduce operations is disclosed. The method includes sending a chunk of a message in a receive buffer at a current index of a send buffer to a next node in a virtual ring of nodes, receiving a chunk of the message from a previous node in the virtual ring of nodes and store the chunk at the current index of the receive buffer, and reducing a chunk in a send buffer at a previous index of the receive buffer and a chunk in the receive buffer at a previous index of the receive buffer and storing a result at the previous index of the receive buffer. The method includes repeating the sending, receiving and storing, and reducing and storing steps until all chunks of the message are reduced, and sending reduced chunks to the next node and receive reduced chunks from the previous node.
    Type: Application
    Filed: November 30, 2020
    Publication date: October 5, 2023
    Applicant: Intel Corporation
    Inventors: Guokai Ma, Zhouhai Ye, Feng Zou, Xiaojie Deng
  • Patent number: 9141362
    Abstract: A method and system to support scheduling of memory store instructions across atomic regions in binary translation in a processing unit or processor. In one embodiment of the invention, the processing unit has a store buffer that allows store instructions to be issued in different order than the source binary program order but still retire in source binary program order. This facilitates a small atomic region that maps to each iteration of a source binary code and these atomic regions are joined together into a pipelined region. In one embodiment of the invention, the processing unit executes commit instruction(s) once every loop iteration instead of executing the commit instruction(s) once after the loop exit.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: September 22, 2015
    Assignee: Intel Corporation
    Inventors: Guokai Ma, Yihua Jin, Daniel M. Lavery, Jianhui Li
  • Publication number: 20140282437
    Abstract: A method and system to support scheduling of memory store instructions across atomic regions in binary translation in a processing unit or processor. In one embodiment of the invention, the processing unit has a store buffer that allows store instructions to be issued in different order than the source binary program order but still retire in source binary program order. This facilitates a small atomic region that maps to each iteration of a source binary code and these atomic regions are joined together into a pipelined region. In one embodiment of the invention, the processing unit executes commit instruction(s) once every loop iteration instead of executing the commit instruction(s) once after the loop exit.
    Type: Application
    Filed: September 27, 2012
    Publication date: September 18, 2014
    Inventors: Guokai Ma, Yihua Jin, Daniel M. Lavery, Jianhui Li
  • Publication number: 20060288188
    Abstract: A technique includes performing multiple aligned accesses to a memory to retrieve data of a string misaligned with respect to boundaries of the memory by an offset. Based on the offset, a subset of the data is selected, and the subset is stored in a register.
    Type: Application
    Filed: June 17, 2005
    Publication date: December 21, 2006
    Inventors: Guokai Ma, Jianhui Li