Patents by Inventor Junping ZHAO

Junping ZHAO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

VIDEO MEMORY MANAGEMENT METHODS FOR LARGE LANGUAGE MODEL INFERENCE, DEVICES, MEDIA, AND PRODUCTS

Publication number: 20260105005

Abstract: This specification provides video memory management methods for large language model inference, devices, media, and products, which are applied to a service device deployed with a large language model. The method includes: allocating physical video memory resources on the service device, to separately map the physical video memory resources to a first video memory resource pool in which a cache object is a key-value cache and a second video memory resource pool in which a cache object is an intermediate activation value; and for an inference task submitted to the large language model, upon determining that an idle video memory resource in any video memory resource pool is insufficient to cache a corresponding cache object for the inference task, temporarily transferring at least a portion of idle video memory resources in another video memory resource pool to the any video memory resource pool.

Type: Application

Filed: December 10, 2024

Publication date: April 16, 2026

Inventors: Rui ZHANG, Junping ZHAO
TEXT GENERATION METHODS AND APPARATUSES, STORAGE MEDIUM DEVICES, AND PROGRAM PRODUCTS

Publication number: 20260064937

Abstract: This specification provides text generation methods, apparatuses, and storage medium devices. One method includes the following operations. In an iteration of a plurality of iterations under a large language model (LLM): estimating a first text sequence following a current text sequence based on a speculative decoding method, forming a plurality of candidate sequences based on the current text sequence and subsequences of the first text sequence, allocating logical blocks to text units in the plurality of candidate sequences in a key-value cache, to store attention information of the text units, mapping the allocated logical blocks to physical blocks based on a first criterion, and determining, by the LLM, a newly generated text unit in the iteration by using attention information of each candidate sequence in the key-value cache, to form a current text sequence for a next iteration of the plurality of iterations.

Type: Application

Filed: November 25, 2024

Publication date: March 5, 2026

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Changxu Shao, Yuhong Guo, Junping Zhao
GPU COMPUTATIONAL RESOURCE SCHEDULING METHODS AND APPARATUSES

Publication number: 20260037317

Abstract: This disclosure provides GPU computational resource scheduling methods and apparatuses. In an implementation, a method includes: in response to a target computing task created in a computing cluster, determining a task type of the target computing task. If the target computing task is a first-type computing task, scheduling, for running, the target computing task to a first GPU hardware that has remaining computational resources satisfying a computational demand of the target computing task in the computing cluster. In response to a first indication indicating that is reported by a first computing node integrated with the first GPU hardware and that indicates that the first-type computing task exclusively occupies computational resources of the first GPU hardware, rescheduling, for running to a second GPU hardware that has remaining computational resources satisfying a computational demand of the second-type computing task in the computing cluster.

Type: Application

Filed: December 11, 2024

Publication date: February 5, 2026

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Rui Fang, Mingliang Gong, Ning Wang, Zhonghui Jiang, Junping Zhao, Tongkai Yang, Jiahao Gong, Xiaoyun Mao
KEY-VALUE CACHE MANAGEMENT, MODEL REASONING, AND DATA PROCESSING METHODS AND APPARATUSES FOR LARGE LANGUAGE MODELS

Publication number: 20260017208

Abstract: Implementations of this specification provide key-value cache management, model reasoning, and data processing methods and apparatuses for large language models. In an implementation, a method comprises allocating a virtual memory block in a virtual address slot to newly-added token key-value data of a model reasoning request, in response to determining that a scheduling result of the model reasoning request indicates the model reasoning request is scheduled for execution, maintaining a mapping relationship between an occupied virtual address slot and a physical graphics memory block allocated to the model reasoning request, and copying the newly-added token key-value data to the physical graphics memory block.

Type: Application

Filed: November 25, 2024

Publication date: January 15, 2026

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Rui Zhang, Junping Zhao
DATA PROCESSING METHOD, APPARATUS, AND SYSTEM BASED ON GPU ON-CHIP MEMORY

Publication number: 20260010395

Abstract: Methods, apparatuses, and systems for data processing based on graphics processing unit (GPU) on-chip memories are described. A data obtaining operation for first data is initiated on a first GPU thread. The first data include writable data needed by a GPU computing task. When the first GPU thread performs the data obtaining operation, a data preloading process of preloading second data from a GPU global memory to the GPU on-chip memory is initiated on a second GPU thread. The second data include read-only data that are needed by the GPU computing task and that are stored in the GPU global memory. The GPU computing task is executed on the second GPU thread based on the first data and the second data in response to that a data obtaining process of the first data and the data preloading process of the second data are completed.

Type: Application

Filed: November 5, 2024

Publication date: January 8, 2026

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Changxu Shao, Kaihong Zhang, Junping Zhao
GRAPHICS MEMORY REUSE METHODS AND APPARATUSES BASED ON GPU MULTISTREAM CONCURRENCY

Publication number: 20250377937

Abstract: Embodiments of this specification provide graphics memory reuse methods and apparatuses based on GPU multistream concurrency. In an implementation of a default stream reuse mode, a method includes determining, based on (1) a released graphics memory corresponding to a current GPU stream that comprises a GPU instruction to which a graphics memory is to be allocated and (2) whether the current GPU stream is a default stream, whether a candidate reusable graphics memory block exists in a graphics memory pool for storing a released graphics memory block. If the candidate reusable graphics memory block exists, determining, from the candidate reusable graphics memory block, a graphics memory block to be allocated to the GPU instruction.

Type: Application

Filed: November 27, 2024

Publication date: December 11, 2025

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Rui Zhang, Junping Zhao, Jiale Xu
REINFORCEMENT LEARNING MODEL TRAINING METHODS AND APPARATUSES

Publication number: 20250348750

Abstract: Methods, computer-readable media, and apparatuses relate to a reinforcement learning model training are described. An example model training system includes at least one training process and at least one inference process. An example method includes: in an inference process, obtaining a latest model weight, updating a weight value of a reinforcement learning model; generating response data based on input data by using an updated reinforcement learning model, forming a training sample based on the input data and the response data, and storing the training sample in a target storage area; and in a training process, obtaining the training sample from the target storage area; updating a weight value of the reinforcement learning model based on the training sample, and sending an updated model weight to the inference process.

Type: Application

Filed: November 6, 2024

Publication date: November 13, 2025

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Xudong Han, Rui Zhang, Zhen Li, Jian Sha, Junping Zhao
Lightweight security for internet of things messaging

Patent number: 12278893

Abstract: An apparatus in one embodiment comprises a processing platform configured to communicate over a network with a plurality of Internet of Things (IoT) devices. The processing platform receives at least a first intermediate message from a first gateway of the network, receives one or more additional intermediate messages from each of one or more additional gateways of the network, associates the first and additional intermediate messages with one another based at least in part on a common message identifier detected in each such intermediate message, and processes the associated first and additional intermediate messages to recover a device message from a given one of the IoT devices. The first intermediate message is based at least in part on at least one application of a designated cryptographic function to the device message utilizing a corresponding key. At least one of the one or more additional intermediate messages provides at least a portion of the key.

Type: Grant

Filed: April 25, 2018

Date of Patent: April 15, 2025

Assignee: EMC IP Holding Company LLC

Inventors: Junping Zhao, Mohamed Sohail
Optimization of graphics processing unit memory for deep learning computing

Patent number: 12154025

Abstract: Systems and methods are provided for optimizing GPU memory allocation for high-performance applications such as deep learning (DL) computing. For example, a DL task is executed using GPU resources (GPU device and GPU memory) to process a DL model having functional layers that are processed in a predefined sequence. A current functional layer of the DL model is invoked and processed using the GPU device. In response to the invoking, a data compression operation is performed to compress data of a previous functional layer of the DL model, and store the compressed data in the GPU memory. Responsive to the invoking, compressed data of a next functional layer of the DL model is accessed from the GPU memory and a data decompression operation is performed to decompress the compressed data for subsequent processing of the next functional layer of the DL model by the GPU device.

Type: Grant

Filed: February 13, 2018

Date of Patent: November 26, 2024

Assignee: EMC IP Holding Company LLC

Inventors: Dragan Savic, Junping Zhao
METHODS, SYSTEMS, AND NON-TRANSITORY STORAGE MEDIA FOR GRAPHICS MEMORY ALLOCATION

Publication number: 20240005446

Abstract: In response to a graphics memory allocation request generated during the running of a target task and for graphics memory needed during running of the target task, target data generated during running of each sub-task of multiple sub-tasks is classified, where a type of the target data comprises at least first data, and where the first data is not used by a subsequent sub-task. Multiple target graphics memory pools are allocated to the multiple sub-tasks. Each target graphics memory pool of the multiple target graphics memory pools is divided into at least one graphics memory block based on a type of the target data, where the at least one graphics memory block includes at least a first graphics memory block corresponding to the first data, and where multiple first graphics memory blocks corresponding to the multiple sub-tasks are mapped to a same target physical memory address.

Type: Application

Filed: June 29, 2023

Publication date: January 4, 2024

Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.

Inventors: Xiaofeng Mei, Yao Zhang, Junping Zhao
Method, device and computer program product for managing dedicated processing resources

Patent number: 11663050

Abstract: A resource management method comprises: in response to receiving, from an application operating on a client, a resource allocation request indicating an amount of dedicated processing resources required by the application, acquiring a mapping between a group of physical dedicated processing resources provided by a group of servers and a group of logical dedicated processing resources, the group of physical dedicated processing resources being divided into the group of logical dedicated processing resources; determining allocation statuses of the group of logical dedicated processing resources; determining, based at least on the mapping and the allocation statuses, a first amount of logical dedicated processing resources to be allocated to the application from the group of logical dedicated processing resources; and indicating the first amount of logical dedicated processing resources to the application, to allow the application to utilize physical dedicated processing resources provided by at least one of the

Type: Grant

Filed: April 15, 2019

Date of Patent: May 30, 2023

Assignee: EMC IP Holding Company LLC

Inventors: Layne Lin Peng, Junping Zhao, Wei Cui
GLUE DEGRADERS AND METHODS OF USE THEREOF

Publication number: 20220363671

Abstract: Described herein are glue degrader compounds, their various targets, their preparation, pharmaceutical compositions comprising them, and their use in the treatment or prevention of conditions, diseases, and disorders mediated by various target proteins.

Type: Application

Filed: September 16, 2020

Publication date: November 17, 2022

Inventors: Jake AXFORD, Rohan Eric John BECKWITH, Simone BONAZZI, Nicole BUSCHMANN, Artiom CERNIJENKO, Janetta DEWHURST, Aleem FAZAL, Matthew James HESSE, Lauren HOLDER, Viktor HORNAK, Hidetomo IMASE, Rama JAIN, Xianming JIN, John Ryan KERRIGAN, Julie LACHAL, Fupeng MA, Hasnain Ahmed MALIK, James R. MANNING, Daniel MCKAY, Robert Joseph MOREAU, Pierre NIMSGERN, Gary O'BRIEN, Anna VULPETTI, Ken YAMADA, Junping ZHAO
Method, device and computer program product for determining resource amount for dedicated processing resources

Patent number: 11442779

Abstract: Embodiments of the present disclosure relate to a method, device and computer program product for determining a resource amount of dedicated processing resources. The method comprises obtaining a structural representation of a neural network for deep learning processing, the structural representation indicating a layer attribute of the neural network that is associated with the dedicated processing resources; and determining the resource amount of the dedicated processing resources required for the deep learning processing based on the structural representation. In this manner, the resource amount of the dedicated processing resources required by the deep learning processing may be better estimated to improve the performance and resource utilization rate of the dedicated processing resource scheduling.

Type: Grant

Filed: January 4, 2019

Date of Patent: September 13, 2022

Assignee: Dell Products L.P.

Inventors: Junping Zhao, Sanping Li
Intelligent data storage and management for cloud computing

Patent number: 11438413

Abstract: Systems and methods are provided for implementing an intelligent data management system for data storage and data management in a cloud computing environment. For example, a system includes an application server, a distributed data storage system, and an intelligent data management system. The application server is configured to host a data processing application. The distributed data storage system is configured to store data generated by a network of devices associated with the data processing application. The intelligent data management system is configured to manage data storage operations for storing the data generated by the network of devices in the distributed data storage system. For example, the intelligent data management system is configured to determine one or more data types of the data generated by the network of devices and select one of a plurality of repositories within the distributed data storage system to store the data based on the determined data types.

Type: Grant

Filed: April 29, 2019

Date of Patent: September 6, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Mohamed Sohail, Junping Zhao
Method, device and computer program product for deploying a machine learning model

Patent number: 11429902

Abstract: Embodiments of the present disclosure relate to a method, device and computer program product for deploying a machine learning model. The method comprises: receiving an intermediate representation indicating processing of a machine learning model, learning parameters of the machine learning model, and a computing resource requirement for executing the machine learning model, the intermediate representation, the learning parameters, and the computing resource requirement being determined based on an original code of the machine learning model, the intermediate representation being irrelevant to a programming language of the original code; determining, at least based on the computing resource requirement, a computing node and a parameter storage node for executing the machine learning model; storing the learning parameters in the parameter storage node; and sending the intermediate representation to the computing node for executing the machine learning model with the stored learning parameters.

Type: Grant

Filed: May 20, 2019

Date of Patent: August 30, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Jinpeng Liu, Pengfei Wu, Junping Zhao, Kun Wang
Method, a device, and a computer program product for determining a resource required for executing a code segment

Patent number: 11354159

Abstract: A method comprises: compiling the code segment with a compiler; and determining, based on an intermediate result of the compiling, a resource associated with a dedicated processing unit and for executing the code segment. As such, the resource required for executing a code segment may be determined quickly without actually executing the code segment and allocating or releasing the resource, which helps subsequent resource allocation and further brings about a better user experience.

Type: Grant

Filed: August 14, 2019

Date of Patent: June 7, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Jinpeng Liu, Pengfei Wu, Junping Zhao, Kun Wang
Methods, apparatuses and computer program products for transmitting data

Patent number: 11336580

Abstract: Embodiments of the present disclosure provide methods, apparatuses and computer program products for transmitting data. A method comprises determining, at a source node, a traffic type of a packet to be sent to a destination node, the source node and the destination node having therebetween a plurality of network paths for different traffic types. The method further comprises including a mark indicating the traffic type in the packet. In addition, the method further comprises sending the packet including the mark to the destination node such that the packet is forwarded along one of the plurality of network paths specific to the traffic type. Embodiments of the present disclosure can transmit data using different network paths based on different traffic types of data so as to optimize network performance for different network requirements.

Type: Grant

Filed: April 9, 2019

Date of Patent: May 17, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Zhi Ying, Junping Zhao, Kun Wang
Implementing parameter server in networking infrastructure for high-performance computing

Patent number: 11315013

Abstract: Techniques are provided for implementing a parameter server within a networking infrastructure of a computing system to reduce the communication bandwidth and latency for performing communication synchronization operations of the parameter server. For example, a method includes executing a distributed deep learning (DL) model training process to train model parameters of a DL model using a plurality of worker nodes executing on one or more server nodes of a computing system, and executing a parameter server within a networking infrastructure of the computing system to aggregate local model parameters computed by the plurality of worker nodes and to distribute aggregated model parameters to the plurality of worker nodes using the networking infrastructure of the computing system.

Type: Grant

Filed: April 23, 2018

Date of Patent: April 26, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Dragan Savic, Junping Zhao
Method, apparatus, and computer program product for selecting computing resources for processing computing task based on processing performance

Patent number: 11314557

Abstract: A method for processing a computing task comprises: dividing multiple computing resources into multiple groups on the basis of topology information describing a connection relationship between the multiple computing resources; selecting at least one computing resource from at least one group of the multiple groups; determining processing performance of processing the computing task with the selected at least one computing resource; and allocating the at least one computing resource on the basis of the processing performance to process the computing task. Accordingly, the multiple computing resources can be utilized sufficiently, so that the computing task can be processed with better processing performance.

Type: Grant

Filed: April 30, 2019

Date of Patent: April 26, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Junping Zhao, Kun Wang
Method, device and computer program product for managing memory of dedicated processing resource

Patent number: 11281384

Abstract: A method comprises determining, in a process of storing data for a computing task of a first dedicated processing resource of a set of dedicated processing resources to the first dedicated processing resource, a size of an available space of a memory of the first dedicated processing resource; in response to the size of the available space of the memory of the first dedicated processing resource being lower than a predetermined threshold value, determining a second dedicated processing resource of the set of dedicated processing resources, a size of an available space of a memory of the second dedicated processing resource is greater than the predetermined threshold value; and causing at least one portion of the data not stored on the memory of the first dedicated processing resource to be stored on the memory of the second dedicated processing resource.

Type: Grant

Filed: April 26, 2019

Date of Patent: March 22, 2022

Assignee: EMC IP Holding Company LLC

Inventors: Junping Zhao, Kun Wang

1 2 3 4 5 … next