Rで行うSOM解析 | som と class パッケージの使い方

自己組織化マップ（SOM）のアルゴリズムを実装している R のパッケージは次のようなものがある。

som パッケージの使い方

som パッケージ中の解析関数は som である。この関数に、解析したいデータをデータフレーム型のオブジェクト（変数）を与えて解析する。ここで利用するサンプルデータは R にデフォルトで入っている iris というデータセットを利用します。


library(som)

# サンプルデータをロード
data(iris)
val <- iris[, 1:4]            # 数値データ
lab <- iris[, 5]              # 分類（正解）
tag <- as.numeric(lab)        # 正解データをプロットするための識別タグ

head(val)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2
## 6          5.4         3.9          1.7         0.4

head(tag)
## [1] 1 1 1 1 1 1

# 数値データを利用して10 X 10ピクセルの出力層を用意
model <- som(val, xdim = 10, ydim = 10)

# 出力層をプロット
plot(model)


# 実際のデータがどの座標にプロットされているかを表示させる
# points(model$visual$x, model$visual$y, pch = tag, col = tag, cex = 2)

# 上記のようにすると、データが重なり見づらくなるため、乱数を発生させて、位置を少しずらしてプロット
x.rand <- runif(nrow(iris), -0.2, 0.2)
y.rand <- runif(nrow(iris), -0.2, 0.2)
points(model$visual$x + x.rand, model$visual$y + y.rand, pch = tag, col = tag, cex = 2)

次のようにすることで、データのみをプロットすることができる。

plot(model$visual$x + x.rand, model$visual$y + y.rand, pch = tag, col = tag, cex = 2)

class パッケージの使い方

class パッケージに含まれている自己組織化マップの解析関数は SOM 関数である。サンプルデータとして R のパッケージに含まれる iris を利用する。（kohonen パッケージとの使い方がほぼ同じ。）

library(class)

# サンプルデータをロード
data(iris)
val <- iris[, 1:4]            # 数値データ
lab <- iris[, 5]              # 分類（正解）
tag <- as.numeric(lab)        # 正解データをプロットするための識別タグ


# 数値データを利用して10 X 10ピクセルの出力層を用意
layer <- somgrid(xdim = 10, ydim = 10)
model <- SOM(val, grid = layer)


# 出力層をプロット
plot(model)
#abline(v = seq(0.5, 100.5, by = 1), h = seq(0.5, 100.5, by = 1)) # 各クラスの境界線を明確にする

head(model$codes)
##      Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,]     5.843914    3.079311     3.704857    1.173350
## [2,]     5.843914    3.079311     3.704857    1.173350
## [3,]     5.949995    3.015165     4.002057    1.321238
## [4,]     5.783853    3.059938     3.683415    1.177109
## [5,]     7.640000    2.620000     6.830000    2.275000
## [6,]     4.400000    3.000000     1.300000    0.200000

プロットされた出力層には四辺形が 10 × 10 個並んでいる。これらは、横右方向から反時計回りに、それぞれ model$codes 中の Sepal.Length、Sepal.Witdth、Petal.Lenth と Petal.Width の値を示している。

実際のデータがどのクラスに配属されたかをプロットするには、次のようにする。som パッケージの som 関数とは異なり、class パッケージの SOM 関数の解析結果には座標情報が含まれていないため、自分で求める必要がある。

# 解析結果の座標情報を求める
classes <- as.numeric(knn(model$codes, val, 1:100))

# 同じ座標のデータが重ならないように、座標を少しずらすための乱数を付る
x.rand <- runif(nrow(iris), -0.2, 0.2)
y.rand <- runif(nrow(iris), -0.2, 0.2)

# プロット（左下の図）
plot(model)
points(
  model$grid$pts[classes, "x"] + x.rand,   # x座標 
  model$grid$pts[classes, "y"] + y.rand,   # y座標
  pch = tag, col = tag, cex = 2            # 色やプロットマーカーなど
)
abline(v = seq(0.5, 100.5, by = 1), h = seq(0.5, 100.5, by = 1)) # 各クラスの境界線を明確にする


# プロット（右下の図）
plot(
  model$grid$pts[classes, "x"] + x.rand,   # x座標 
  model$grid$pts[classes, "y"] + y.rand,   # y座標
  pch = tag, col = tag, cex = 2            # 色やプロットマーカーなど
)
abline(v = seq(0.5, 100.5, by = 1), h = seq(0.5, 100.5, by = 1)) # 各クラスの境界線を明確にする

References

R. Wehrens and L.M.C. Buydens. Self- and Super-organising Maps in R: the kohonen package. J. Stat. Softw. 2007, 21:5. J. Stat. Softw.
Kohonen, T. Self-Organiziing Maps. Springer. 1997.