Introduction

Deep Generative AI, a field of artificial intelligence that focuses on generating new data similar to training data, is having an impact not only in text generation and image generation but also 3d model generation in the design industry. In the realm of architectural design, especially in the phase of initial design, generative AI can serve as a useful tool for examining many design options.

 
Physical model examination by humans
Shou Sugi Ban / BYTR Architects


Leveraging the Deep Signed Distance Functions model (DeepSDF) with latent vectors, this project aims to build the algorithm that can synthesize infinite number of skyscrapers simliar to trained data. These vectors, mapped within a high-dimensional latent space, serve as the DNA for synthesizing potential skyscrapers.

So, through the system that manipulates the latent vectors for expressing the shape of the buildings, architectural designers can rapidly generate and examine a diverse array of design options.

By manipulating(interpolation, and arithmetic operations) between two or more latent vectors, the model gives us infinite design options virtually. This method not only provides a brand-new design process but also will lead us to explore the novel architectural forms previously unattainable through conventional design methodologies.


Understanding Signed Distance Functions

In wikipedia, Signed Distance Functions (SDFs) is defined as follows:

In mathematics and its applications, the signed distance function (or oriented distance function) is the orthogonal distance of a given point x to the boundary of a set Ω in a metric space, with the sign determined by whether or not x is in the interior of Ω. The function has positive values at points x inside Ω, it decreases in value as x approaches the boundary of Ω where the signed distance function is zero, and it takes negative values outside of Ω. However, the alternative convention is also sometimes taken instead (i.e., negative inside Ω and positive outside).

SDF representation applied to the Stanford Bunny
(a) Way to decide sign: If the point is on the surface, SDF = 0
(b) 2D cross-section of the signed distance field
(c) Rendered 3D surface recovered from SDF = 0


At the core of SDFs lies their simplicity and power in describing complex geometries. Unlike traditional mesh representations, which rely on vertices, edges, and faces to define forms, by using SDFs, we can construct 3d mesh models with a continuous surface, and it just needs 3D grid-shaped XYZs and their corresponding SDF values.

Let's take a look at the following example for the CCTV headquarters by OMA recovered from SDF = 0. Initially, to obtain SDF values, the entire space around the CCTV headquarters model is sampled on a regular grid (In this example, resolution indicates the number of grid points. i.e., resolution 8 means 8x8x8 grid). At each grid point, the SDF provides a value that indicates the point's distance from the closest surface of the model. Inside the model, these values are negative (or positive, depending on the convention), and outside, they are positive (or negative).

As you can see in the below figure, more grid points result in more detailed and accurate 3d models. The numbers of the grid points used in examples are respectively 8x8x8(=512), 16x16x16(=4096), 32x32x32(=32768), 64x64x64(=262144), 128x128x128(=2097152). To recover meshes using grid points and SDF values, it needs to use the Marching Cubes algorithm.

Recovered CCTV headquarters from the SDFs
top 3, original 3d model · resolution 8 · resolution 16
bottom 3, resolution 32 · resolution 64 · resolution 128


Because we need each sign that decides whether the point is inside or outside the model, the meshes we recover from SDF values must be watertight meshes that are fully closed. By using the below code, I examined the recovered meshes from the SDF values and each grid resolution. The check_watertight parameter is set to True, so the code checks if the mesh is watertight, if not fully closed, it will convert the mesh to the watertight mesh using pcu.

    mesh = DataCreatorHelper.load_mesh(
        path=r"deepSDF\data\raw-skyscrapers\cctv_headquarter.obj", 
        normalize=True, 
        map_z_to_y=True, 
        check_watertight=True, 
        translate_mode=DataCreatorHelper.CENTER_WITHOUT_Z
    )
    
    for resolution in [8, 16, 32, 64, 128]:
        coords, grid_size_axis = ReconstructorHelper.get_volume_coords(resolution=resolution)

        sdf, *_ = pcu.signed_distance_to_mesh(
            np.array(coords.cpu(), dtype=np.float32), 
            mesh.vertices.astype(np.float32, order='F'), 
            mesh.faces.astype(np.int32)
        )
        
        recovered_mesh = ReconstructorHelper.extract_mesh(
            grid_size_axis, 
            torch.tensor(sdf), 
        )

Data preparation and processing

The first step for preparing data to train DeepSDF model is to gather the 3d models of the skyscrapers. I used the 3dwarehouse to download the free 3d models. I downloaded the following models in the below figure. From the left, CCTV headquarters · Mahanakhon · Hearst Tower · Bank of China · Empire State Building · Transamerica Pyramid · The Shard · Gherkin London · Taipei 101 · Shanghai World Financial Center · One World Trade Center · Lotte Tower · Kingdom Centre · China Zun · Burj Al Arab. The 15 raw data I gathered is in this link.

Skyscrapers


The next step includes (1) normalizing all data to fit within a regular grid volume, (2) converting them into a consistent format.

(1) normalizing: Generally, when geometry data is used for learning, it is normalized to a value between 0 and 1 for each individual object, and normalized by moving the centroid of the model to the origin(0, 0). i.e., the farthest point of the model is set to 1. If we use the normalization method used generally, it doesn't reflect the relative height of the skyscrapers. Therefore, in this project, the height of the highest model among all skyscrapers data is set to 1 and normalized.
Normalized skyscrapers
From the left, the lowest building (Gherkin London) · the highest building (One World Trade Center)


(2) converting: The DeepSDF model's feed-forward networks have the following architecture. It is composed of 8 fully connected layers, denoted as "FC" on the diagram. As you can see in the below figure, the dimension of the input X excepting latent vectors consists of (x, y, z) 3.

The feed-forward network for DeepSDF model


The data sample \( X \) is composed of \( (x, y, z) \) and the corresponding label \( s \) like this: \( X := \{(x, s) : SDF(x) = s\} \) Additionally, class numbers are required to assign a latent vector to each sample. As mentioned in the introduction part, the latent vectors play the role of the DNA in representing the shape of the buildings.

    class SDFdataset(Dataset, Configuration):
        def __init__(self, data_path: str = Configuration.SAVE_DATA_PATH):
            self.sdf_dataset, self.cls_nums, self.cls_dict = self._get_sdf_dataset(data_path=data_path)

        def __len__(self) -> int:
            return len(self.sdf_dataset)

        def __getitem__(self, index: int) -> Tuple[torch.Tensor]:
            xyz = self.sdf_dataset[index, :3]
            sdf = self.sdf_dataset[index, 3]
            cls = self.sdf_dataset[index, 4].long()

            return xyz.to(self.DEVICE), sdf.to(self.DEVICE), cls.to(self.DEVICE)


Implementing and training of DeepSDF model

As can be seen in the above part of (2) converting, the feed-forward network of the DeepSDF model is simple as follows.

    class SDFdecoder(nn.Module, Configuration):
        def __init__(self, cls_nums: int, latent_size: int = Configuration.LATENT_SIZE):
            super().__init__()

            self.main_1 = nn.Sequential(
                nn.Linear(latent_size + 3, 512),
                nn.ReLU(True),
                nn.Linear(512, 512),
                nn.ReLU(True),
                nn.Linear(512, 512),
                nn.ReLU(True),
                nn.Linear(512, 512),
                nn.ReLU(True),
                nn.Linear(512, 512),
            )

            self.main_2 = nn.Sequential(
                nn.Linear(latent_size + 3 + 512, 512),
                nn.ReLU(True),
                nn.Linear(512, 512),
                nn.ReLU(True),
                nn.Linear(512, 512),
                nn.ReLU(True),
                nn.Linear(512, 1),
                nn.Tanh(),
            )

            self.latent_codes = nn.Parameter(torch.FloatTensor(cls_nums, latent_size))
            self.latent_codes.to(self.DEVICE)
            self.to(self.DEVICE)

        def forward(self, i, xyz, cxyz_1=None):
            if cxyz_1 is None:
                cxyz_1 = torch.cat((self.latent_codes[i], xyz), dim=1)

            x1 = self.main_1(cxyz_1)

            # skip connection
            cxyz_2 = torch.cat((x1, cxyz_1), dim=1)
            x2 = self.main_2(cxyz_2)

            return x2

The SDFdecoder class has the following arguments as inputs:
  • cls_nums is the number of skyscrapers
  • latent_size is the dimension of the latent vector

In this project, the cls_nums and latent_size were used as 15 and 128, respectively. Therefore, the size of the instance variable initialized for the latent vector (self.latent_codes) is torch.Size([15, 128]).

The skip connection technique used in forward propagation enables the model to learn complex functions representing the SDF by combining low-level information (XYZ coordinates) with the high-level features learned by the network.

Now let's look at the learning process. It was trained for 150 epochs, and the total number of data (number of points) is 64x64x64x15 (=3932160). This is divided into an 8:2 ratio and used in the learning and evaluation process. It took an average of 1000 seconds per epoch. At the end of each epoch loop, I added code to reconstruct the skyscraper to qualitatively evaluate the model.
Training process for 150 epochs
From the top, reconstructed skyscraper · losses


After training the model for 150 epochs, I reconstructed the 3D model by predicting the SDF value for each point in regular grid points for 15 skyscrapers with latent vectors. In the below figure, the buildings in the left row are reconstructed by the model, and the buildings in the right row are the original 3D models.



Comparing reconstructed skyscrapers vs. originals


I think the model is unable to reconstruct the original skyscrapers with precise details accurately, but it seems to generate skyscrapers that are similar to the original skyscrapers appropriately. In this task for reconstructing models, I used 384x384x384(=56623104) for the grid points resolution.


Synthesizing skyscrapers infinitely

Lastly, let me synthesize skyscrapers by interpolating them or using arithmetic operations. Through the following code, you can generate infinite data of different shapes, starting by synthesizing latent vectors for the initial 15 buildings. In this time, to synthesize them, I used 128x128x128(=2097152) grid points resolution.

    def infinite_synthesis(
        sdf_decoder: SDFdecoder,
        save_dir: str,
        synthesis_count: int = np.inf,
        resolution: int = 128,
        map_z_to_y: bool = True,
        check_watertight: bool = True,
    ):
        synthesizer = Synthesizer()
    
        synthesized_latent_codes_npz = "infinite_synthesized_latent_codes.npz"
        synthesized_latent_codes_path = os.path.join(save_dir, synthesized_latent_codes_npz)
    
        os.makedirs(save_dir, exist_ok=True)
    
        synthesized_latent_codes = {
            "data": [
                {
                    "name": i,
                    "index": i,
                    "synthesis_type": "initial",
                    "latent_code": list(latent_code.detach().cpu().numpy()),
                }
                for i, latent_code in enumerate(sdf_decoder.latent_codes)
            ]
        }
    
        if os.path.exists(synthesized_latent_codes_path):
            synthesized_latent_codes = {
                "data": list(np.load(synthesized_latent_codes_path, allow_pickle=True)["synthesized_data"])
            }
    
        while len(synthesized_latent_codes["data"]) < synthesis_count:
            print("synthesized data length:", len(synthesized_latent_codes["data"]))
    
            if random.Random(time.time()).random() < 0.5:
                selected_indices, synthesized_latent_code = synthesizer.random_arithmetic_operations_synthesis(
                    sdf_decoder=sdf_decoder, latent_codes_data=synthesized_latent_codes
                )
    
                synthesis_type = "arithmetic"
    
                name = f"{selected_indices}.obj"
                save_name = os.path.join(save_dir, name)
    
            else:
                (
                    selected_indices,
                    random_interpolation_factor,
                    synthesized_latent_code,
                ) = synthesizer.random_interpolation_synthesis(
                    sdf_decoder=sdf_decoder, latent_codes_data=synthesized_latent_codes
                )
    
                synthesis_type = "interpolation"
    
                name = f"{selected_indices}__{str(random_interpolation_factor).replace('.', '-')}.obj"
                save_name = os.path.join(save_dir, name)
    
            if os.path.exists(save_name):
                continue
    
            _ = synthesizer.synthesize(
                sdf_decoder=sdf_decoder,
                latent_code=synthesized_latent_code,
                resolution=resolution,
                save_name=save_name,
                map_z_to_y=map_z_to_y,
                check_watertight=check_watertight,
            )
    
            synthesized_data = {
                "name": name,
                "index": len(synthesized_latent_codes["data"]),
                "synthesis_type": synthesis_type,
                "latent_code": list(synthesized_latent_code.detach().cpu().numpy()),
            }
    
            synthesized_latent_codes["data"].append(synthesized_data)
    
            np.savez(
                synthesized_latent_codes_path,
                synthesized_data=np.array(synthesized_latent_codes["data"]),
            )
    
            clear_output(wait=False)    


From 15 skyscrapers to the 450 skyscrapers
The first row that is illustrated as rectangles the initial 15 skyscrapers


Tracking synthesized data

Since the function used above for synthesizing skyscrapers records data, we can use the data to check the parents of the synthesized skyscrapers. The process involves tracking back from any given synthesized design to its origins using graph-based analysis. This helps us to understand how specific designs are derived and the influence of original models on synthesized outcomes. Therefore, to track the synthesized skyscrapers, I used the BFS.

The figures below demonstrate the application of these functions, showing the trace and visualization of synthesized skyscrapers starting from initial designs through various synthesis steps, culminating in complex structures. This illustrates the complex relationships and dependencies within the synthesized skyscrapers.


Tracking synthesized skyscrapers

Limitations and future works

While the project demonstrates the potential of Deep Generative AI in synthesizing skyscraper designs, it's not without its limitations. So, the following points need to be improved:
  • Design Evaluation: Although the model can synthesize skyscraper designs, it currently lacks the capability to automatically evaluate the quality of these designs.

  • Detail Expression: One of the significant limitations of the current model is its inability to capture the intricate details of the skyscraper models accurately.

  • Computational Resources: The process of training the model, especially at high resolutions for detailed synthesis, requires substantial computational power and time

  • Interactive Design Tools: Developing interactive tools that allow architects to manipulate latent vectors directly or specify constraints and preferences could make the technology more practical and appealing for real-world design applications.


References