Patents by Inventor Junping ZHAO

Junping ZHAO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260105005
    Abstract: This specification provides video memory management methods for large language model inference, devices, media, and products, which are applied to a service device deployed with a large language model. The method includes: allocating physical video memory resources on the service device, to separately map the physical video memory resources to a first video memory resource pool in which a cache object is a key-value cache and a second video memory resource pool in which a cache object is an intermediate activation value; and for an inference task submitted to the large language model, upon determining that an idle video memory resource in any video memory resource pool is insufficient to cache a corresponding cache object for the inference task, temporarily transferring at least a portion of idle video memory resources in another video memory resource pool to the any video memory resource pool.
    Type: Application
    Filed: December 10, 2024
    Publication date: April 16, 2026
    Inventors: Rui ZHANG, Junping ZHAO
  • Publication number: 20260064937
    Abstract: This specification provides text generation methods, apparatuses, and storage medium devices. One method includes the following operations. In an iteration of a plurality of iterations under a large language model (LLM): estimating a first text sequence following a current text sequence based on a speculative decoding method, forming a plurality of candidate sequences based on the current text sequence and subsequences of the first text sequence, allocating logical blocks to text units in the plurality of candidate sequences in a key-value cache, to store attention information of the text units, mapping the allocated logical blocks to physical blocks based on a first criterion, and determining, by the LLM, a newly generated text unit in the iteration by using attention information of each candidate sequence in the key-value cache, to form a current text sequence for a next iteration of the plurality of iterations.
    Type: Application
    Filed: November 25, 2024
    Publication date: March 5, 2026
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Changxu Shao, Yuhong Guo, Junping Zhao
  • Publication number: 20260037317
    Abstract: This disclosure provides GPU computational resource scheduling methods and apparatuses. In an implementation, a method includes: in response to a target computing task created in a computing cluster, determining a task type of the target computing task. If the target computing task is a first-type computing task, scheduling, for running, the target computing task to a first GPU hardware that has remaining computational resources satisfying a computational demand of the target computing task in the computing cluster. In response to a first indication indicating that is reported by a first computing node integrated with the first GPU hardware and that indicates that the first-type computing task exclusively occupies computational resources of the first GPU hardware, rescheduling, for running to a second GPU hardware that has remaining computational resources satisfying a computational demand of the second-type computing task in the computing cluster.
    Type: Application
    Filed: December 11, 2024
    Publication date: February 5, 2026
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Rui Fang, Mingliang Gong, Ning Wang, Zhonghui Jiang, Junping Zhao, Tongkai Yang, Jiahao Gong, Xiaoyun Mao
  • Publication number: 20260017208
    Abstract: Implementations of this specification provide key-value cache management, model reasoning, and data processing methods and apparatuses for large language models. In an implementation, a method comprises allocating a virtual memory block in a virtual address slot to newly-added token key-value data of a model reasoning request, in response to determining that a scheduling result of the model reasoning request indicates the model reasoning request is scheduled for execution, maintaining a mapping relationship between an occupied virtual address slot and a physical graphics memory block allocated to the model reasoning request, and copying the newly-added token key-value data to the physical graphics memory block.
    Type: Application
    Filed: November 25, 2024
    Publication date: January 15, 2026
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Rui Zhang, Junping Zhao
  • Publication number: 20260010395
    Abstract: Methods, apparatuses, and systems for data processing based on graphics processing unit (GPU) on-chip memories are described. A data obtaining operation for first data is initiated on a first GPU thread. The first data include writable data needed by a GPU computing task. When the first GPU thread performs the data obtaining operation, a data preloading process of preloading second data from a GPU global memory to the GPU on-chip memory is initiated on a second GPU thread. The second data include read-only data that are needed by the GPU computing task and that are stored in the GPU global memory. The GPU computing task is executed on the second GPU thread based on the first data and the second data in response to that a data obtaining process of the first data and the data preloading process of the second data are completed.
    Type: Application
    Filed: November 5, 2024
    Publication date: January 8, 2026
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Changxu Shao, Kaihong Zhang, Junping Zhao
  • Publication number: 20250377937
    Abstract: Embodiments of this specification provide graphics memory reuse methods and apparatuses based on GPU multistream concurrency. In an implementation of a default stream reuse mode, a method includes determining, based on (1) a released graphics memory corresponding to a current GPU stream that comprises a GPU instruction to which a graphics memory is to be allocated and (2) whether the current GPU stream is a default stream, whether a candidate reusable graphics memory block exists in a graphics memory pool for storing a released graphics memory block. If the candidate reusable graphics memory block exists, determining, from the candidate reusable graphics memory block, a graphics memory block to be allocated to the GPU instruction.
    Type: Application
    Filed: November 27, 2024
    Publication date: December 11, 2025
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Rui Zhang, Junping Zhao, Jiale Xu
  • Publication number: 20250348750
    Abstract: Methods, computer-readable media, and apparatuses relate to a reinforcement learning model training are described. An example model training system includes at least one training process and at least one inference process. An example method includes: in an inference process, obtaining a latest model weight, updating a weight value of a reinforcement learning model; generating response data based on input data by using an updated reinforcement learning model, forming a training sample based on the input data and the response data, and storing the training sample in a target storage area; and in a training process, obtaining the training sample from the target storage area; updating a weight value of the reinforcement learning model based on the training sample, and sending an updated model weight to the inference process.
    Type: Application
    Filed: November 6, 2024
    Publication date: November 13, 2025
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Xudong Han, Rui Zhang, Zhen Li, Jian Sha, Junping Zhao
  • Patent number: 12278893
    Abstract: An apparatus in one embodiment comprises a processing platform configured to communicate over a network with a plurality of Internet of Things (IoT) devices. The processing platform receives at least a first intermediate message from a first gateway of the network, receives one or more additional intermediate messages from each of one or more additional gateways of the network, associates the first and additional intermediate messages with one another based at least in part on a common message identifier detected in each such intermediate message, and processes the associated first and additional intermediate messages to recover a device message from a given one of the IoT devices. The first intermediate message is based at least in part on at least one application of a designated cryptographic function to the device message utilizing a corresponding key. At least one of the one or more additional intermediate messages provides at least a portion of the key.
    Type: Grant
    Filed: April 25, 2018
    Date of Patent: April 15, 2025
    Assignee: EMC IP Holding Company LLC
    Inventors: Junping Zhao, Mohamed Sohail
  • Patent number: 12154025
    Abstract: Systems and methods are provided for optimizing GPU memory allocation for high-performance applications such as deep learning (DL) computing. For example, a DL task is executed using GPU resources (GPU device and GPU memory) to process a DL model having functional layers that are processed in a predefined sequence. A current functional layer of the DL model is invoked and processed using the GPU device. In response to the invoking, a data compression operation is performed to compress data of a previous functional layer of the DL model, and store the compressed data in the GPU memory. Responsive to the invoking, compressed data of a next functional layer of the DL model is accessed from the GPU memory and a data decompression operation is performed to decompress the compressed data for subsequent processing of the next functional layer of the DL model by the GPU device.
    Type: Grant
    Filed: February 13, 2018
    Date of Patent: November 26, 2024
    Assignee: EMC IP Holding Company LLC
    Inventors: Dragan Savic, Junping Zhao
  • Publication number: 20240005446
    Abstract: In response to a graphics memory allocation request generated during the running of a target task and for graphics memory needed during running of the target task, target data generated during running of each sub-task of multiple sub-tasks is classified, where a type of the target data comprises at least first data, and where the first data is not used by a subsequent sub-task. Multiple target graphics memory pools are allocated to the multiple sub-tasks. Each target graphics memory pool of the multiple target graphics memory pools is divided into at least one graphics memory block based on a type of the target data, where the at least one graphics memory block includes at least a first graphics memory block corresponding to the first data, and where multiple first graphics memory blocks corresponding to the multiple sub-tasks are mapped to a same target physical memory address.
    Type: Application
    Filed: June 29, 2023
    Publication date: January 4, 2024
    Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.
    Inventors: Xiaofeng Mei, Yao Zhang, Junping Zhao
  • Patent number: 11663050
    Abstract: A resource management method comprises: in response to receiving, from an application operating on a client, a resource allocation request indicating an amount of dedicated processing resources required by the application, acquiring a mapping between a group of physical dedicated processing resources provided by a group of servers and a group of logical dedicated processing resources, the group of physical dedicated processing resources being divided into the group of logical dedicated processing resources; determining allocation statuses of the group of logical dedicated processing resources; determining, based at least on the mapping and the allocation statuses, a first amount of logical dedicated processing resources to be allocated to the application from the group of logical dedicated processing resources; and indicating the first amount of logical dedicated processing resources to the application, to allow the application to utilize physical dedicated processing resources provided by at least one of the
    Type: Grant
    Filed: April 15, 2019
    Date of Patent: May 30, 2023
    Assignee: EMC IP Holding Company LLC
    Inventors: Layne Lin Peng, Junping Zhao, Wei Cui
  • Publication number: 20220363671
    Abstract: Described herein are glue degrader compounds, their various targets, their preparation, pharmaceutical compositions comprising them, and their use in the treatment or prevention of conditions, diseases, and disorders mediated by various target proteins.
    Type: Application
    Filed: September 16, 2020
    Publication date: November 17, 2022
    Inventors: Jake AXFORD, Rohan Eric John BECKWITH, Simone BONAZZI, Nicole BUSCHMANN, Artiom CERNIJENKO, Janetta DEWHURST, Aleem FAZAL, Matthew James HESSE, Lauren HOLDER, Viktor HORNAK, Hidetomo IMASE, Rama JAIN, Xianming JIN, John Ryan KERRIGAN, Julie LACHAL, Fupeng MA, Hasnain Ahmed MALIK, James R. MANNING, Daniel MCKAY, Robert Joseph MOREAU, Pierre NIMSGERN, Gary O'BRIEN, Anna VULPETTI, Ken YAMADA, Junping ZHAO
  • Patent number: 11442779
    Abstract: Embodiments of the present disclosure relate to a method, device and computer program product for determining a resource amount of dedicated processing resources. The method comprises obtaining a structural representation of a neural network for deep learning processing, the structural representation indicating a layer attribute of the neural network that is associated with the dedicated processing resources; and determining the resource amount of the dedicated processing resources required for the deep learning processing based on the structural representation. In this manner, the resource amount of the dedicated processing resources required by the deep learning processing may be better estimated to improve the performance and resource utilization rate of the dedicated processing resource scheduling.
    Type: Grant
    Filed: January 4, 2019
    Date of Patent: September 13, 2022
    Assignee: Dell Products L.P.
    Inventors: Junping Zhao, Sanping Li
  • Patent number: 11438413
    Abstract: Systems and methods are provided for implementing an intelligent data management system for data storage and data management in a cloud computing environment. For example, a system includes an application server, a distributed data storage system, and an intelligent data management system. The application server is configured to host a data processing application. The distributed data storage system is configured to store data generated by a network of devices associated with the data processing application. The intelligent data management system is configured to manage data storage operations for storing the data generated by the network of devices in the distributed data storage system. For example, the intelligent data management system is configured to determine one or more data types of the data generated by the network of devices and select one of a plurality of repositories within the distributed data storage system to store the data based on the determined data types.
    Type: Grant
    Filed: April 29, 2019
    Date of Patent: September 6, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Mohamed Sohail, Junping Zhao
  • Patent number: 11429902
    Abstract: Embodiments of the present disclosure relate to a method, device and computer program product for deploying a machine learning model. The method comprises: receiving an intermediate representation indicating processing of a machine learning model, learning parameters of the machine learning model, and a computing resource requirement for executing the machine learning model, the intermediate representation, the learning parameters, and the computing resource requirement being determined based on an original code of the machine learning model, the intermediate representation being irrelevant to a programming language of the original code; determining, at least based on the computing resource requirement, a computing node and a parameter storage node for executing the machine learning model; storing the learning parameters in the parameter storage node; and sending the intermediate representation to the computing node for executing the machine learning model with the stored learning parameters.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: August 30, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Jinpeng Liu, Pengfei Wu, Junping Zhao, Kun Wang
  • Patent number: 11354159
    Abstract: A method comprises: compiling the code segment with a compiler; and determining, based on an intermediate result of the compiling, a resource associated with a dedicated processing unit and for executing the code segment. As such, the resource required for executing a code segment may be determined quickly without actually executing the code segment and allocating or releasing the resource, which helps subsequent resource allocation and further brings about a better user experience.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: June 7, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Jinpeng Liu, Pengfei Wu, Junping Zhao, Kun Wang
  • Patent number: 11336580
    Abstract: Embodiments of the present disclosure provide methods, apparatuses and computer program products for transmitting data. A method comprises determining, at a source node, a traffic type of a packet to be sent to a destination node, the source node and the destination node having therebetween a plurality of network paths for different traffic types. The method further comprises including a mark indicating the traffic type in the packet. In addition, the method further comprises sending the packet including the mark to the destination node such that the packet is forwarded along one of the plurality of network paths specific to the traffic type. Embodiments of the present disclosure can transmit data using different network paths based on different traffic types of data so as to optimize network performance for different network requirements.
    Type: Grant
    Filed: April 9, 2019
    Date of Patent: May 17, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Zhi Ying, Junping Zhao, Kun Wang
  • Patent number: 11315013
    Abstract: Techniques are provided for implementing a parameter server within a networking infrastructure of a computing system to reduce the communication bandwidth and latency for performing communication synchronization operations of the parameter server. For example, a method includes executing a distributed deep learning (DL) model training process to train model parameters of a DL model using a plurality of worker nodes executing on one or more server nodes of a computing system, and executing a parameter server within a networking infrastructure of the computing system to aggregate local model parameters computed by the plurality of worker nodes and to distribute aggregated model parameters to the plurality of worker nodes using the networking infrastructure of the computing system.
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: April 26, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Dragan Savic, Junping Zhao
  • Patent number: 11314557
    Abstract: A method for processing a computing task comprises: dividing multiple computing resources into multiple groups on the basis of topology information describing a connection relationship between the multiple computing resources; selecting at least one computing resource from at least one group of the multiple groups; determining processing performance of processing the computing task with the selected at least one computing resource; and allocating the at least one computing resource on the basis of the processing performance to process the computing task. Accordingly, the multiple computing resources can be utilized sufficiently, so that the computing task can be processed with better processing performance.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: April 26, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Junping Zhao, Kun Wang
  • Patent number: 11281384
    Abstract: A method comprises determining, in a process of storing data for a computing task of a first dedicated processing resource of a set of dedicated processing resources to the first dedicated processing resource, a size of an available space of a memory of the first dedicated processing resource; in response to the size of the available space of the memory of the first dedicated processing resource being lower than a predetermined threshold value, determining a second dedicated processing resource of the set of dedicated processing resources, a size of an available space of a memory of the second dedicated processing resource is greater than the predetermined threshold value; and causing at least one portion of the data not stored on the memory of the first dedicated processing resource to be stored on the memory of the second dedicated processing resource.
    Type: Grant
    Filed: April 26, 2019
    Date of Patent: March 22, 2022
    Assignee: EMC IP Holding Company LLC
    Inventors: Junping Zhao, Kun Wang