Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
123 views
in Technique[技术] by (71.8m points)

How does the total file size change with different input data when compressing with Parquet?

To Developers, I am writing Parquet compressed data to memory and then streaming it to the cloud. I want to optimize my memory usage and be able to compress the data to the smallest possible total output size. With testing, it was found that the Parquet output having each row group of about 1,000 records was optimal in processing of the file. The number of columns was 12 and split evenly with numbers and strings.

Would the size of the Parquet footer be linear as the records increased from 100 to 1,000 to 10,000 records for a fixed count of Row Groups and columns? The metadata in the footer does not appear to hold any record size data just offsets and fixed elements like sizes for each column in a Row Group.

The only thing I can play with are the number of records in a Row Group (better compression) and the number of Row Groups that will make up a file.

Any thoughts?

Thanks, Marc


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...