OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation

Overview of OmniZoom
Figure 1: Plug-and-play integration

Abstract

Dual-camera smartphones suffer from geometric and photometric inconsistencies during zoom transitions, primarily due to disparities in intrinsic/extrinsic parameters and divergent image processing pipelines between the two cameras. Existing interpolation methods struggle to effectively address this issue, constrained by the lack of ground-truth datasets and motion ambiguity in dynamic scenarios. To overcome these challenges, we propose OmniZoom, a universal plug-and-play paradigm for cross-device smooth zoom interpolation. Specifically, we present a novel cross-device virtual data generation method utilizing 3D Gaussian Splatting. This method tackles data scarcity by decoupling geometric features via spatial transition modeling and correcting photometric variations with dynamic color adaptation. It is further enhanced by cross-domain consistency learning for device-agnostic semantic alignment. Additionally, we introduce a plug-and-play 3D Trajectory Progress Ratio (3D-TPR) framework that surmounts 2D spatial limitations. As components of our framework, a texture-focus strategy is introduced for high-frequency detail preservation, incorporating mask penalty constraints to suppress interpolation artifacts. Our pipeline exhibits broad compatibility with diverse interpolation methods and achieves good performance across multiple public benchmarks. Real-world evaluations on various smartphone platforms also reveal significant quality improvements after fine-tuning on our synthetic data, which underscores the robustness and practical effectiveness of our approach for cross-device zoom applications.

Results

We evaluate OmniZoom on diverse real-world ZI benchmarks across multiple devices. Our method outperforms existing methods in both structural fidelity and color consistency, demonstrating its robustness in complex domain shift scenarios.

Results
Figure 2: Visual comparison of 2D and 3D-TPR FI results across networks. Each row shows interpolation results at the same timestep.
Results
Figure 3: Qualitative results on real-world data across four FI networks at timestep \(t=1/2\). The subscript \(_f\) denotes models finetuned on our ZI dataset.

These results demonstrate the advantages of our 3D-TPR interpolation approach, especially under temporal uncertainty. Notably, the fine-tuned models yield significantly improved sharpness and color consistency.

BibTeX

BibTex Code Here