Why Plenoxels?

There are several projects that reduce training time of optimizing neural fields, such as DVGO-v1, DVGO-v2, Plenoxels, Instant Neural Graphics Primitives, TensoRF, and PointNeRF. Here, we compare recent fast-training NeRF models and describe why Plenoxels is a suitable representation for perception datasets.

Reason 1: Plenoxels is a fully explicit representation

According to the second version of DVGO paper, Improved Direct Voxel Grid Optimization, Plenoxels is the only representation that uses explicit features only. In other words, Plenoxels directly stores density volume, and view-dependent colors by spherical harmonics coefficients.

Reference: DVGO-v2

Method	Data structure	Density	Color	Training Time
DVGO	Dense Grid	Explicit	Hybrid	< 30min
DVGO-v2	Dense Grid	Explicit	Hybrid	< 20min
Plenoxels	Sparse Grid	Explicit	Explicit	< 30min
INGP	Multi-level Hash	Hybrid	Hybrid	< 5min
TensoRF	Dense Grid	Explicit	Hybrid	< 30min
PointNeRF	Point Cloud	Explicit	Explicit	> 1 day

Reason 2: Great reconstruction quality

Plenoxels shows great ability for reconstructing scenes compared to the others in both indoor and outdoor scenarios. We randomly pick 5 sequences each from CO3D and ScanNet. We report the rendering quality and training time for each method. We compare Plenoxels with DVGO-v2 since it has shown comparable performance on outdoor scenes. For the other methods, we could not use them as our data format since 1) INGP implicitly encodes geometries and does not cover unbound scenarios, 2) TensoRF and DVGO-v1 do not have representation for backgrounds, and 3) PointNeRF takes a long time for optimization. For DVGO-v2, we follow the Tanks and Temples setup.

We will soon add experiments about reconstruction abilities of recent methods.