Patents Assigned to Advanced Institute of Big Data, Beijing

High-performance data lake system and data storage method

Patent number: 11789899

Abstract: The present disclosure provides a high-performance data lake system and a data storage method. The data storage method includes the following steps: S1: converting a file into a file stream; S2: converting the file stream into an array in which multiple subarrays are nested; and S3: converting the array into a resilient distributed dataset (RDD), and storing the RDD to a storage layer of a data lake. The present disclosure provides a nested field structure, which lays the foundation for parallel processing in reading, and effectively improves read performance. Furthermore, the present disclosure flexibly generates a number of nested subarrays according to hardware cores, such that the data lake achieves better extension performance, and can keep optimal writing efficiency for different users.

Type: Grant

Filed: November 17, 2022

Date of Patent: October 17, 2023

Assignees: Nanhu Laboratory, Advanced Institute of Big Data, Beijing

Inventors: Hao Liu, Zhiling Chen, Tao Zhang, Peng Wang, Qiuye Wang, Chenxi Yu, Wei Chen, Yinlong Liu, Zhefeng Liu, Yonggang Tu
HIGH-PERFORMANCE DATA LAKE SYSTEM AND DATA STORAGE METHOD

Publication number: 20230153267

Abstract: The present disclosure provides a high-performance data lake system and a data storage method. The data storage method includes the following steps: S1: converting a file into a file stream; S2: converting the file stream into an array in which multiple subarrays are nested; and S3: converting the array into a resilient distributed dataset (RDD), and storing the RDD to a storage layer of a data lake. The present disclosure provides a nested field structure, which lays the foundation for parallel processing in reading, and effectively improves read performance. Furthermore, the present disclosure flexibly generates a number of nested subarrays according to hardware cores, such that the data lake achieves better extension performance, and can keep optimal writing efficiency for different users.

Type: Application

Filed: November 17, 2022

Publication date: May 18, 2023

Applicants: Nanhu Laboratory, Advanced Institute of Big Data, Beijing

Inventors: Hao LIU, Zhiling CHEN, Tao ZHANG, Peng WANG, Qiuye WANG, Chenxi YU, Wei CHEN, Yinlong LIU, Zhefeng LIU, Yonggang TU

High-performance data lake system and data storage method

HIGH-PERFORMANCE DATA LAKE SYSTEM AND DATA STORAGE METHOD