Patents by Inventor Yonggang TU

Yonggang TU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

High-performance data lake system and data storage method

Patent number: 11789899

Abstract: The present disclosure provides a high-performance data lake system and a data storage method. The data storage method includes the following steps: S1: converting a file into a file stream; S2: converting the file stream into an array in which multiple subarrays are nested; and S3: converting the array into a resilient distributed dataset (RDD), and storing the RDD to a storage layer of a data lake. The present disclosure provides a nested field structure, which lays the foundation for parallel processing in reading, and effectively improves read performance. Furthermore, the present disclosure flexibly generates a number of nested subarrays according to hardware cores, such that the data lake achieves better extension performance, and can keep optimal writing efficiency for different users.

Type: Grant

Filed: November 17, 2022

Date of Patent: October 17, 2023

Assignees: Nanhu Laboratory, Advanced Institute of Big Data, Beijing

Inventors: Hao Liu, Zhiling Chen, Tao Zhang, Peng Wang, Qiuye Wang, Chenxi Yu, Wei Chen, Yinlong Liu, Zhefeng Liu, Yonggang Tu
HIGH-PERFORMANCE DATA LAKE SYSTEM AND DATA STORAGE METHOD

Publication number: 20230153267

Abstract: The present disclosure provides a high-performance data lake system and a data storage method. The data storage method includes the following steps: S1: converting a file into a file stream; S2: converting the file stream into an array in which multiple subarrays are nested; and S3: converting the array into a resilient distributed dataset (RDD), and storing the RDD to a storage layer of a data lake. The present disclosure provides a nested field structure, which lays the foundation for parallel processing in reading, and effectively improves read performance. Furthermore, the present disclosure flexibly generates a number of nested subarrays according to hardware cores, such that the data lake achieves better extension performance, and can keep optimal writing efficiency for different users.

Type: Application

Filed: November 17, 2022

Publication date: May 18, 2023

Applicants: Nanhu Laboratory, Advanced Institute of Big Data, Beijing

Inventors: Hao LIU, Zhiling CHEN, Tao ZHANG, Peng WANG, Qiuye WANG, Chenxi YU, Wei CHEN, Yinlong LIU, Zhefeng LIU, Yonggang TU

High-performance data lake system and data storage method

HIGH-PERFORMANCE DATA LAKE SYSTEM AND DATA STORAGE METHOD