To Developers,
I am writing Parquet compressed data to memory and then streaming it to the cloud. I want to optimize my memory usage and be able to compress the data to the smallest possible total output size. With testing, it was found that the Parquet output having each row group of about 1,000 records was optimal in processing of the file. The number of columns was 12 and split evenly with numbers and strings.
Would the size of the Parquet footer be linear as the records increased from 100 to 1,000 to 10,000 records for a fixed count of Row Groups and columns?
The metadata in the footer does not appear to hold any record size data just offsets and fixed elements like sizes for each column in a Row Group.
The only thing I can play with are the number of records in a Row Group (better compression) and the number of Row Groups that will make up a file.
Any thoughts?
Thanks,
Marc
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…