Presentation description
Most research institutions and funding agencies have stringent rules about data storage and retention. In fields such as nanoscale biology, where multicolor super-resolution videos can exceed 1 Tb per video, many researchers use commercial video compression algorithms such as Advanced Video Coding (H.264), High Efficiency Video Coding (H.265), and AOMedia Video 1 (AV1) to store imaging data. However, little research has been done to investigate the effects of such lossy algorithms on data quality. Using simulated imaging data mimicking typical biological datasets, we investigated methods of preserving data quality while maximizing storage efficiency. The background noise contained in a dataset often contains valuable statistical information for meta-analyses. We found that newer, more efficient compression algorithms such as AV1 disproportionately target (i.e. remove or blur) background noise in order to achieve higher compression ratios. We further investigated the effect of compression on signal-to-noise and gain calculations and found that above a certain threshold all three codecs (H.264, H.265, and AV1) distort the data so much that metadata recovery becomes questionable or impossible. A point source imaged at the diffraction limit is well-approximated by a 2D gaussian peak. We confirm prior sporadic reports that most video compression algorithms allow for accurate tracking of the 2D Gaussian peak location. However, older codecs such as H.264 decrease the performance of motion-tracking algorithms more than newer codecs for a given compression ratio. Interestingly, we also found that under certain conditions, compression with newer codecs such as AV1 actually improves the performance of motion-tracking. However, at high compression ratios the peaks in compressed videos are no longer accurately modeled as Gaussian. Our research provides the first clear guidelines on the use of modern lossy codecs for the purpose of scientific data retention, and highlights that codecs optimized for consumer data are not necessarily optimal for scientific data.