Patents by Inventor Junping ZHAO
Junping ZHAO has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260105005Abstract: This specification provides video memory management methods for large language model inference, devices, media, and products, which are applied to a service device deployed with a large language model. The method includes: allocating physical video memory resources on the service device, to separately map the physical video memory resources to a first video memory resource pool in which a cache object is a key-value cache and a second video memory resource pool in which a cache object is an intermediate activation value; and for an inference task submitted to the large language model, upon determining that an idle video memory resource in any video memory resource pool is insufficient to cache a corresponding cache object for the inference task, temporarily transferring at least a portion of idle video memory resources in another video memory resource pool to the any video memory resource pool.Type: ApplicationFiled: December 10, 2024Publication date: April 16, 2026Inventors: Rui ZHANG, Junping ZHAO
-
Publication number: 20260064937Abstract: This specification provides text generation methods, apparatuses, and storage medium devices. One method includes the following operations. In an iteration of a plurality of iterations under a large language model (LLM): estimating a first text sequence following a current text sequence based on a speculative decoding method, forming a plurality of candidate sequences based on the current text sequence and subsequences of the first text sequence, allocating logical blocks to text units in the plurality of candidate sequences in a key-value cache, to store attention information of the text units, mapping the allocated logical blocks to physical blocks based on a first criterion, and determining, by the LLM, a newly generated text unit in the iteration by using attention information of each candidate sequence in the key-value cache, to form a current text sequence for a next iteration of the plurality of iterations.Type: ApplicationFiled: November 25, 2024Publication date: March 5, 2026Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Changxu Shao, Yuhong Guo, Junping Zhao
-
Publication number: 20260037317Abstract: This disclosure provides GPU computational resource scheduling methods and apparatuses. In an implementation, a method includes: in response to a target computing task created in a computing cluster, determining a task type of the target computing task. If the target computing task is a first-type computing task, scheduling, for running, the target computing task to a first GPU hardware that has remaining computational resources satisfying a computational demand of the target computing task in the computing cluster. In response to a first indication indicating that is reported by a first computing node integrated with the first GPU hardware and that indicates that the first-type computing task exclusively occupies computational resources of the first GPU hardware, rescheduling, for running to a second GPU hardware that has remaining computational resources satisfying a computational demand of the second-type computing task in the computing cluster.Type: ApplicationFiled: December 11, 2024Publication date: February 5, 2026Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Rui Fang, Mingliang Gong, Ning Wang, Zhonghui Jiang, Junping Zhao, Tongkai Yang, Jiahao Gong, Xiaoyun Mao
-
Publication number: 20260017208Abstract: Implementations of this specification provide key-value cache management, model reasoning, and data processing methods and apparatuses for large language models. In an implementation, a method comprises allocating a virtual memory block in a virtual address slot to newly-added token key-value data of a model reasoning request, in response to determining that a scheduling result of the model reasoning request indicates the model reasoning request is scheduled for execution, maintaining a mapping relationship between an occupied virtual address slot and a physical graphics memory block allocated to the model reasoning request, and copying the newly-added token key-value data to the physical graphics memory block.Type: ApplicationFiled: November 25, 2024Publication date: January 15, 2026Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Rui Zhang, Junping Zhao
-
Publication number: 20260010395Abstract: Methods, apparatuses, and systems for data processing based on graphics processing unit (GPU) on-chip memories are described. A data obtaining operation for first data is initiated on a first GPU thread. The first data include writable data needed by a GPU computing task. When the first GPU thread performs the data obtaining operation, a data preloading process of preloading second data from a GPU global memory to the GPU on-chip memory is initiated on a second GPU thread. The second data include read-only data that are needed by the GPU computing task and that are stored in the GPU global memory. The GPU computing task is executed on the second GPU thread based on the first data and the second data in response to that a data obtaining process of the first data and the data preloading process of the second data are completed.Type: ApplicationFiled: November 5, 2024Publication date: January 8, 2026Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Changxu Shao, Kaihong Zhang, Junping Zhao
-
Publication number: 20250377937Abstract: Embodiments of this specification provide graphics memory reuse methods and apparatuses based on GPU multistream concurrency. In an implementation of a default stream reuse mode, a method includes determining, based on (1) a released graphics memory corresponding to a current GPU stream that comprises a GPU instruction to which a graphics memory is to be allocated and (2) whether the current GPU stream is a default stream, whether a candidate reusable graphics memory block exists in a graphics memory pool for storing a released graphics memory block. If the candidate reusable graphics memory block exists, determining, from the candidate reusable graphics memory block, a graphics memory block to be allocated to the GPU instruction.Type: ApplicationFiled: November 27, 2024Publication date: December 11, 2025Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Rui Zhang, Junping Zhao, Jiale Xu
-
Publication number: 20250348750Abstract: Methods, computer-readable media, and apparatuses relate to a reinforcement learning model training are described. An example model training system includes at least one training process and at least one inference process. An example method includes: in an inference process, obtaining a latest model weight, updating a weight value of a reinforcement learning model; generating response data based on input data by using an updated reinforcement learning model, forming a training sample based on the input data and the response data, and storing the training sample in a target storage area; and in a training process, obtaining the training sample from the target storage area; updating a weight value of the reinforcement learning model based on the training sample, and sending an updated model weight to the inference process.Type: ApplicationFiled: November 6, 2024Publication date: November 13, 2025Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Xudong Han, Rui Zhang, Zhen Li, Jian Sha, Junping Zhao
-
Patent number: 12278893Abstract: An apparatus in one embodiment comprises a processing platform configured to communicate over a network with a plurality of Internet of Things (IoT) devices. The processing platform receives at least a first intermediate message from a first gateway of the network, receives one or more additional intermediate messages from each of one or more additional gateways of the network, associates the first and additional intermediate messages with one another based at least in part on a common message identifier detected in each such intermediate message, and processes the associated first and additional intermediate messages to recover a device message from a given one of the IoT devices. The first intermediate message is based at least in part on at least one application of a designated cryptographic function to the device message utilizing a corresponding key. At least one of the one or more additional intermediate messages provides at least a portion of the key.Type: GrantFiled: April 25, 2018Date of Patent: April 15, 2025Assignee: EMC IP Holding Company LLCInventors: Junping Zhao, Mohamed Sohail
-
Patent number: 12154025Abstract: Systems and methods are provided for optimizing GPU memory allocation for high-performance applications such as deep learning (DL) computing. For example, a DL task is executed using GPU resources (GPU device and GPU memory) to process a DL model having functional layers that are processed in a predefined sequence. A current functional layer of the DL model is invoked and processed using the GPU device. In response to the invoking, a data compression operation is performed to compress data of a previous functional layer of the DL model, and store the compressed data in the GPU memory. Responsive to the invoking, compressed data of a next functional layer of the DL model is accessed from the GPU memory and a data decompression operation is performed to decompress the compressed data for subsequent processing of the next functional layer of the DL model by the GPU device.Type: GrantFiled: February 13, 2018Date of Patent: November 26, 2024Assignee: EMC IP Holding Company LLCInventors: Dragan Savic, Junping Zhao
-
Publication number: 20240005446Abstract: In response to a graphics memory allocation request generated during the running of a target task and for graphics memory needed during running of the target task, target data generated during running of each sub-task of multiple sub-tasks is classified, where a type of the target data comprises at least first data, and where the first data is not used by a subsequent sub-task. Multiple target graphics memory pools are allocated to the multiple sub-tasks. Each target graphics memory pool of the multiple target graphics memory pools is divided into at least one graphics memory block based on a type of the target data, where the at least one graphics memory block includes at least a first graphics memory block corresponding to the first data, and where multiple first graphics memory blocks corresponding to the multiple sub-tasks are mapped to a same target physical memory address.Type: ApplicationFiled: June 29, 2023Publication date: January 4, 2024Applicant: Alipay (Hangzhou) Information Technology Co., Ltd.Inventors: Xiaofeng Mei, Yao Zhang, Junping Zhao
-
Patent number: 11663050Abstract: A resource management method comprises: in response to receiving, from an application operating on a client, a resource allocation request indicating an amount of dedicated processing resources required by the application, acquiring a mapping between a group of physical dedicated processing resources provided by a group of servers and a group of logical dedicated processing resources, the group of physical dedicated processing resources being divided into the group of logical dedicated processing resources; determining allocation statuses of the group of logical dedicated processing resources; determining, based at least on the mapping and the allocation statuses, a first amount of logical dedicated processing resources to be allocated to the application from the group of logical dedicated processing resources; and indicating the first amount of logical dedicated processing resources to the application, to allow the application to utilize physical dedicated processing resources provided by at least one of theType: GrantFiled: April 15, 2019Date of Patent: May 30, 2023Assignee: EMC IP Holding Company LLCInventors: Layne Lin Peng, Junping Zhao, Wei Cui
-
Publication number: 20220363671Abstract: Described herein are glue degrader compounds, their various targets, their preparation, pharmaceutical compositions comprising them, and their use in the treatment or prevention of conditions, diseases, and disorders mediated by various target proteins.Type: ApplicationFiled: September 16, 2020Publication date: November 17, 2022Inventors: Jake AXFORD, Rohan Eric John BECKWITH, Simone BONAZZI, Nicole BUSCHMANN, Artiom CERNIJENKO, Janetta DEWHURST, Aleem FAZAL, Matthew James HESSE, Lauren HOLDER, Viktor HORNAK, Hidetomo IMASE, Rama JAIN, Xianming JIN, John Ryan KERRIGAN, Julie LACHAL, Fupeng MA, Hasnain Ahmed MALIK, James R. MANNING, Daniel MCKAY, Robert Joseph MOREAU, Pierre NIMSGERN, Gary O'BRIEN, Anna VULPETTI, Ken YAMADA, Junping ZHAO
-
Patent number: 11442779Abstract: Embodiments of the present disclosure relate to a method, device and computer program product for determining a resource amount of dedicated processing resources. The method comprises obtaining a structural representation of a neural network for deep learning processing, the structural representation indicating a layer attribute of the neural network that is associated with the dedicated processing resources; and determining the resource amount of the dedicated processing resources required for the deep learning processing based on the structural representation. In this manner, the resource amount of the dedicated processing resources required by the deep learning processing may be better estimated to improve the performance and resource utilization rate of the dedicated processing resource scheduling.Type: GrantFiled: January 4, 2019Date of Patent: September 13, 2022Assignee: Dell Products L.P.Inventors: Junping Zhao, Sanping Li
-
Patent number: 11438413Abstract: Systems and methods are provided for implementing an intelligent data management system for data storage and data management in a cloud computing environment. For example, a system includes an application server, a distributed data storage system, and an intelligent data management system. The application server is configured to host a data processing application. The distributed data storage system is configured to store data generated by a network of devices associated with the data processing application. The intelligent data management system is configured to manage data storage operations for storing the data generated by the network of devices in the distributed data storage system. For example, the intelligent data management system is configured to determine one or more data types of the data generated by the network of devices and select one of a plurality of repositories within the distributed data storage system to store the data based on the determined data types.Type: GrantFiled: April 29, 2019Date of Patent: September 6, 2022Assignee: EMC IP Holding Company LLCInventors: Mohamed Sohail, Junping Zhao
-
Patent number: 11429902Abstract: Embodiments of the present disclosure relate to a method, device and computer program product for deploying a machine learning model. The method comprises: receiving an intermediate representation indicating processing of a machine learning model, learning parameters of the machine learning model, and a computing resource requirement for executing the machine learning model, the intermediate representation, the learning parameters, and the computing resource requirement being determined based on an original code of the machine learning model, the intermediate representation being irrelevant to a programming language of the original code; determining, at least based on the computing resource requirement, a computing node and a parameter storage node for executing the machine learning model; storing the learning parameters in the parameter storage node; and sending the intermediate representation to the computing node for executing the machine learning model with the stored learning parameters.Type: GrantFiled: May 20, 2019Date of Patent: August 30, 2022Assignee: EMC IP Holding Company LLCInventors: Jinpeng Liu, Pengfei Wu, Junping Zhao, Kun Wang
-
Patent number: 11354159Abstract: A method comprises: compiling the code segment with a compiler; and determining, based on an intermediate result of the compiling, a resource associated with a dedicated processing unit and for executing the code segment. As such, the resource required for executing a code segment may be determined quickly without actually executing the code segment and allocating or releasing the resource, which helps subsequent resource allocation and further brings about a better user experience.Type: GrantFiled: August 14, 2019Date of Patent: June 7, 2022Assignee: EMC IP Holding Company LLCInventors: Jinpeng Liu, Pengfei Wu, Junping Zhao, Kun Wang
-
Patent number: 11336580Abstract: Embodiments of the present disclosure provide methods, apparatuses and computer program products for transmitting data. A method comprises determining, at a source node, a traffic type of a packet to be sent to a destination node, the source node and the destination node having therebetween a plurality of network paths for different traffic types. The method further comprises including a mark indicating the traffic type in the packet. In addition, the method further comprises sending the packet including the mark to the destination node such that the packet is forwarded along one of the plurality of network paths specific to the traffic type. Embodiments of the present disclosure can transmit data using different network paths based on different traffic types of data so as to optimize network performance for different network requirements.Type: GrantFiled: April 9, 2019Date of Patent: May 17, 2022Assignee: EMC IP Holding Company LLCInventors: Zhi Ying, Junping Zhao, Kun Wang
-
Patent number: 11315013Abstract: Techniques are provided for implementing a parameter server within a networking infrastructure of a computing system to reduce the communication bandwidth and latency for performing communication synchronization operations of the parameter server. For example, a method includes executing a distributed deep learning (DL) model training process to train model parameters of a DL model using a plurality of worker nodes executing on one or more server nodes of a computing system, and executing a parameter server within a networking infrastructure of the computing system to aggregate local model parameters computed by the plurality of worker nodes and to distribute aggregated model parameters to the plurality of worker nodes using the networking infrastructure of the computing system.Type: GrantFiled: April 23, 2018Date of Patent: April 26, 2022Assignee: EMC IP Holding Company LLCInventors: Dragan Savic, Junping Zhao
-
Patent number: 11314557Abstract: A method for processing a computing task comprises: dividing multiple computing resources into multiple groups on the basis of topology information describing a connection relationship between the multiple computing resources; selecting at least one computing resource from at least one group of the multiple groups; determining processing performance of processing the computing task with the selected at least one computing resource; and allocating the at least one computing resource on the basis of the processing performance to process the computing task. Accordingly, the multiple computing resources can be utilized sufficiently, so that the computing task can be processed with better processing performance.Type: GrantFiled: April 30, 2019Date of Patent: April 26, 2022Assignee: EMC IP Holding Company LLCInventors: Junping Zhao, Kun Wang
-
Patent number: 11281384Abstract: A method comprises determining, in a process of storing data for a computing task of a first dedicated processing resource of a set of dedicated processing resources to the first dedicated processing resource, a size of an available space of a memory of the first dedicated processing resource; in response to the size of the available space of the memory of the first dedicated processing resource being lower than a predetermined threshold value, determining a second dedicated processing resource of the set of dedicated processing resources, a size of an available space of a memory of the second dedicated processing resource is greater than the predetermined threshold value; and causing at least one portion of the data not stored on the memory of the first dedicated processing resource to be stored on the memory of the second dedicated processing resource.Type: GrantFiled: April 26, 2019Date of Patent: March 22, 2022Assignee: EMC IP Holding Company LLCInventors: Junping Zhao, Kun Wang