CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting

University of California, Los Angeles

Abstract

With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy.

Video

Scene Editing

The explicit nature of the compositional radiance field, enables scene editing. We show a scene of "A table with a banana and plant in front of a couch that is next to a lamp on a stool", which we first edit remove the lamp on the stool, then move the potted plant from the table to the stool, and lastly, edit the couch to become "a red leather couch".

Original Scene

Deleting

Rearrange

Editing

Distillation


Gaussian radiance field generation for text-to-3D neccesistates frequency duplication of Gaussians to enable expressability. However, this results in a large number of Gaussians, many of which are redundant. We propose a distillation method to reduce the number of Gaussians while maintaining the quality of the generated scene. This enables better scalability for compositions with many objects. To the right, we show an example of a generated 3D pear that has been distilled (for visualization, we also include a rendering with the Gaussians scaled down to 1%).

30,360 Gaussians
​ ​ ​ ​ ​ ​ ​ ​ Original
4,096 Gaussians
​ ​ ​ ​ ​ ​ ​ ​ Distilled
30,360 Gaussians
​ ​ ​ ​ ​ ​ ​ ​ Original
4,096 Gaussians
​ ​ ​ ​ ​ ​ ​ ​ Distilled

BibTeX

@article{vilesov23cg3d,
  author    = {Vilesov, Alexander and Chari, Pradyumna and Kadambi, Achuta},
  title     = {CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting},
  journal   = {Arxiv},
  year      = {2023},
}