The K-Means clustering algorithm is the algorithm that clusters to a K number when given data. And it operates as a way to minimize between each cluster about distance difference. This algorithm is a type of the Unsupervised-Learning and serves to label unlabeled input data.
The K-Means algorithm belongs to a partitioning method among clustring method. A partitioning method is a way of splitting that divides multiple partitions when given data. For example, let's assume that n data objects are input. That's when partitioning method divides the given data into K groups less than N, at this time, each group forms a cluster. That is, dividing a piece of data into one or more data objects. K-means clustering, divided by 10
Implementation
The operation flow of the K-Means clustering consists of 5 steps following:
Select up a K (the count of clusters) and enter data
Set initial centroids of clusters randomly
Assign the data to each cluster based on the nearest centroid
Recalculate the new centroids of clusters and re-execute step-4
Terminate if no longer locations of centroids aren't updated
Let's implement the K-Means algorithm based on the steps above. First, we define the KMeans object. As input, it receives the number of clusters(K) to divide, points cloud, and iteration_count which is the number of centroid update iterations
class KMeans(PointHelper):
def __init__(self, points=None, k=3, iteration_count=20, random_seed=0):
"""KMeansCluster simple implementation using Rhino Geometry
Args:
points (Rhino.Geometry.Point3d, optional): Points to classify. Defaults to None. if points is None, make random points
k (int, optional): Number to classify. Defaults to 3.
iteration_count (int, optional): Clusters candidates creation count. Defaults to 20.
random_seed (int, optional): Random seed number to fix. Defaults to 0.
"""
PointHelper.__init__(self)
self.points = points
self.k = k
self.iteration_count = iteration_count
self.threshold = 0.1
import random # pylint: disable=import-outside-toplevel
self.random = random
self.random.seed(random_seed)
Next, Initialize the centroids of clusters as much as the number of K by selecting the given points cloud as much as K randomly. If the initial centroid setting is done, calculate the distance between each centroid and the given points cloud, and assign the data at the cluster which is the closest distance. Now we should update all the centroids of clusters. We need to compute centroids of initial clusters(points cloud clusters) for that.
Finally, compute the distance between the updated centroid and the previous centroid. If this distance does not no longer changes, terminate. Otherwise, just iterate on key things which are explained above.
def kmeans(self, points, k, threshold):
"""Clusters by each iteration
Args:
points (Rhino.Geometry.Point3d): Initialized given points
k (int): Initialized given k
threshold (float): Initialized threshold
Returns:
Tuple[List[List[Rhino.Geometry.Point3d]], List[List[int]]]: Clusters by each iteration, Indices by each iteration
"""
centroids = self.random.sample(points, k)
while True:
clusters = [[] for _ in centroids]
indices = [[] for _ in centroids]
for pi, point in enumerate(points):
point_to_centroid_distance = [
point.DistanceTo(centroid) for centroid in centroids
]
nearest_centroid_index = point_to_centroid_distance.index(
min(point_to_centroid_distance)
)
clusters[nearest_centroid_index].append(point)
indices[nearest_centroid_index].append(pi)
shift_distance = 0.0
for ci, current_centroid in enumerate(centroids):
if len(clusters[ci]) == 0:
continue
updated_centroid = self.get_centroid(clusters[ci])
shift_distance = max(
updated_centroid.DistanceTo(current_centroid),
shift_distance,
)
centroids[ci] = updated_centroid
if shift_distance < threshold:
break
return clusters, indices
Now we can get the point clusters by setting the K, from the code above. And you can see the detailed code for the K-Means at this link. Implemented K-means clustering, divided by 10 From the left, Given Points · Result clusters
K-Rooms clusters
The K-Means algorithm is used to cluster as much as the number of K, given points, like described above. If you utilize it, you can divide a given architectural boundary as much as the number of K. It means that K number of rooms can be obtained.
Before writing the code, set the following order.
As input, it receives the building exterior wall line (closed line) and how many rooms to divide into
Create an oriented bounding box and create a grid
Insert the grid center points into the K-Means clustering algorithm, implemented above
Grids are merged based on the indices of the clustered grid center points
Search for the shortest path from the core to each room and create a corridor
Now let's implement the K-Rooms cluster algorithm. Define the KRoomsClusters class as follows and take as input what you defined in step 1. And inherit the KMeans class.
class KRoomsCluster(KMeans, PointHelper, LineHelper, ConstsCollection):
"""
To use the inherited moduels, refer the link below.
https://github.com/PARKCHEOLHEE-lab/GhPythonUtils
"""
def __init__(self, floor, core, hall, target_area, axis=None):
self.floor = floor
self.core = core
self.hall = hall
self.sorted_hall_segments = self.get_sorted_segment(self.hall)
self.target_area = target_area
self.axis = axis
KMeans.__init__(self)
PointHelper.__init__(self)
LineHelper.__init__(self)
ConstsCollection.__init__(self)
Next, we create an OBB(oriented bounding box) to create the grid. OBB is for defining the grid xy axes. (See this link to see the OBB creation algorithm) Then, create grid and grid centroid, extract the clustering indices by putting the center points of the grid and the K value, and then merge all the grids that exist in the same cluster. K-Rooms clustering key process, divided by 5 From the left, Grid and centroids creation · Divided rooms
def _gen_grid(self):
self.base_rectangles = [
self.get_2d_offset_polygon(seg, self.grid_size)
for seg in self.sorted_hall_segments[:2]
]
counts = []
planes = []
for ri, base_rectangle in enumerate(self.base_rectangles):
x_vector = (
rg.AreaMassProperties.Compute(base_rectangle).Centroid
- rg.AreaMassProperties.Compute(self.hall).Centroid
)
y_vector = copy.copy(x_vector)
y_transform = rg.Transform.Rotation(
math.pi * 0.5,
rg.AreaMassProperties.Compute(base_rectangle).Centroid,
)
y_vector.Transform(y_transform)
base_rectangle.Translate(
(
self.sorted_hall_segments[0].PointAtStart
- self.sorted_hall_segments[0].PointAtEnd
)
/ 2
)
base_rectangle.Translate(
self.get_normalized_vector(x_vector) * -self.grid_size / 2
)
anchor = rg.AreaMassProperties.Compute(base_rectangle).Centroid
plane = rg.Plane(
origin=anchor,
xDirection=x_vector,
yDirection=y_vector,
)
x_proj = self.get_projected_point_on_curve(
anchor, plane.XAxis, self.obb
)
x_count = (
int(math.ceil(x_proj.DistanceTo(anchor) / self.grid_size)) + 1
)
y_projs = [
self.get_projected_point_on_curve(
anchor, plane.YAxis, self.obb
),
self.get_projected_point_on_curve(
anchor, -plane.YAxis, self.obb
),
]
y_count = [
int(math.ceil(y_proj.DistanceTo(anchor) / self.grid_size)) + 1
for y_proj in y_projs
]
planes.append(plane)
counts.append([x_count] + y_count)
x_grid = []
for base_rectangle, count, plane in zip(
self.base_rectangles, counts, planes
):
xc, _, _ = count
for x in range(xc):
copied_rectangle = copy.copy(base_rectangle)
vector = plane.XAxis * self.grid_size * x
copied_rectangle.Translate(vector)
x_grid.append(copied_rectangle)
y_vectors = [planes[0].YAxis, -planes[0].YAxis]
y_counts = counts[0][1:]
all_grid = [] + x_grid
for rectangle in x_grid:
for y_count, y_vector in zip(y_counts, y_vectors):
for yc in range(1, y_count):
copied_rectangle = copy.copy(rectangle)
vector = y_vector * self.grid_size * yc
copied_rectangle.Translate(vector)
all_grid.append(copied_rectangle)
union_all_grid = rg.Curve.CreateBooleanUnion(all_grid, self.TOLERANCE)
for y_count, y_vector in zip(y_counts, y_vectors):
for yc in range(1, y_count):
copied_hall = copy.copy(self.hall)
copied_hall.Translate(
(
self.sorted_hall_segments[0].PointAtStart
- self.sorted_hall_segments[0].PointAtEnd
)
/ 2
)
vector = y_vector * self.grid_size * yc
copied_hall.Translate(vector)
all_grid.extend(
rg.Curve.CreateBooleanDifference(
copied_hall, union_all_grid
)
)
self.grid = []
for grid in all_grid:
for boundary in self.boundaries:
tidied_grid = list(
rg.Curve.CreateBooleanIntersection(
boundary.boundary, grid, self.TOLERANCE
)
)
self.grid.extend(tidied_grid)
self.grid_centroids = [
rg.AreaMassProperties.Compute(g).Centroid for g in self.grid
]
Finally, connect each room with a hallway and you're done! (I implemented Dijkstra algorithm for the shortest-path algorithm. You can check the implementation through the following link. I will do a post on this at the next opportunity.) Corridor creation
Limitation of K-Rooms clusters
The K-Rooms Clusters algorithm we implemented is still incomplete. The images shown above are valid results when the K value is small. When the K value increases and the number of rooms increases, a problem in which an architecturally appropriate shape cannot be derived. For example below: Improper shapes, divided by 13 From the left, K-Rooms · Result corridor All we need to do further to solve this problem is to define the appropriate shape and create the logic to mitigate it with post-processing. And you can see the whole code of K-Rooms clusters at this link.