Our paper titled “Where is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines” has been accepted for publication at the SIGMOD 2022 conference. In this paper, we provide an in-depth analysis of data preprocessing pipelines from four different machine learning domains. We introduce a new perspective on efficiently preparing datasets for end-to-end deep learning pipelines and extract individual trade-offs to optimize throughput, preprocessing time, and storage consumption.
More information is going to follow soon.