의미망 분석 #3 동시 출현 네트워크 (2)

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

59doit

의미망 분석 #3 동시 출현 네트워크 (2) 본문

텍스트마이닝

의미망 분석 #3 동시 출현 네트워크 (2)

yul_S2 2022. 12. 26. 17:21

연결 중심성과 커뮤니티 표현하기

네트워크 그래프는 단어 노드가 많아 어떤 단어 노드 중심으로 해석할지 판단 어려움
연결 중심성과 커뮤니티를 표현하면 단어의 관계를 더 분명하게 파악할 수 있다

- 연결 중심성(degree centrality)

노드가 다른 노드들과 얼마나 밀접하게 연결되는지 나타낸 값

연결 중심성으로 노드 크기를 조정하면 어떤 단어를 눈여겨봐야 할지 판단하기 쉬워진다

- 커뮤니티(community)

단어 간의 관계가 가까워 빈번하게 연결된 노드 집단

노드를 커뮤니티별로 구분 지어 서로 다른 색으로 표현하면 네트워크 구조를 이해하기 쉬워진다

#1 네트워크 그래프 데이터에 연결 중심성, 커뮤니티 변수 추가하기

- 네트워크 그래프 데이터 만들기: as_tbl_graph()

directed = F : 방향성 없도록 설정

group_infomap()은 방향성 없는 네트워크 그래프 데이터에서만 커뮤니티를 찾아줌

- 연결 중심성 변수 추가하기: centrality_degree()

- 커뮤니티 변수 추가하기: group_infomap()

커뮤니티가 정수형 숫자이므로 노드가 그라데이션으로 표현됨

as.factor() : factor 타입으로 변환해 노드 그룹별로 다른 색으로 표현

set.seed(1234)
graph_comment <- pair %>%
  filter(n >= 25) %>%
  as_tbl_graph(directed = F) %>%
  mutate(centrality = centrality_degree(),     # 연결 중심성
         group = as.factor(group_infomap()))   # 커뮤니티

graph_comment

## # A tbl_graph: 36 nodes and 156 edges
## #
## # An undirected multigraph with 1 component
## #
## # Node Data: 36 x 3 (active)
## name centrality group
## <chr> <dbl> <fct>
## 1 봉준호 64 1
## 2 축하 34 1
## 3 영화 28 3
## 4 블랙리스트 6 4
## 5 기생충 26 5
## 6 대한민국 10 5
## # … with 30 more rows
## #
## # Edge Data: 156 x 3
## from to n
## <int> <int> <dbl>
## 1 1 2 198
## 2 1 2 198
## 3 1 3 119
## # … with 149 more rows

#2 네트워크 그래프에 연결 중심성, 커뮤니티 표현하기

geom_node_point(aes())

size = centrality : 연결 중심성에 따라 노드 크기 설정

color = group : 커뮤니티 별로 노드 색깔 다르게

geom_node_point(show.legend = F) : 범례 제거

scale_size(range = c(5, 15)) : 노드 크기 5~15 범위 유지

너무 크거나 작으면 알아보기 불편

set.seed(1234)
ggraph(graph_comment, layout = "fr") + # 레이아웃
  geom_edge_link(color = "gray50",   # 엣지 색깔
                 alpha = 0.5) + # 엣지 명암
  geom_node_point(aes(size = centrality,      # 노드 크기
                      color = group),    # 노드 색깔
                  show.legend = F) +   # 범례 삭제
  scale_size(range = c(5, 15)) +       # 노드 크기 범위
  geom_node_text(aes(label = name),       # 텍스트 표시
                 repel = T,        # 노드밖 표시
                 size = 5,        # 텍스트 크기
                 family = "nanumgothic") +    # 폰트
  theme_graph()              # 배경 삭제

#3 네트워크의 주요 단어 살펴보기

3-1 주요 단어의 커뮤니티 살펴보기

graph_comment %>%
  filter(name == "봉준호")

# # A tbl_graph: 1 nodes and 0 edges
# #
# # An unrooted tree
# #
# # Node Data: 1 x 3 (active)
# name   centrality group
# <chr>       <dbl> <fct>
#   1 봉준호         64 1
# #
# # Edge Data: 0 x 3
# # ... with 3 variables: from <int>, to <int>, n <dbl>

3-2 같은 커뮤니티로 분류된 단어 살펴보기

graph_comment %>%
  filter(group == 4) %>%
  arrange(-centrality) %>%
  data.frame()

# name centrality group
# 1 블랙리스트 6     4
# 2     올리다          4     4
# 3     박근혜          4     4

3-3 연결 중심성이 높은 주요 단어 살펴보기

graph_comment %>%
  arrange(-centrality)

# # A tbl_graph: 36 nodes and 156 edges
# #
# # An undirected multigraph with 1 component
# #
# # Node Data: 36 x 3 (active)
# name     centrality group
# <chr>         <dbl> <fct>
#   1 봉준호           64 1
# 2 축하             34 2
# 3 영화             28 3
# 4 기생충           26 5
# 5 작품상           14 6
# 6 대한민국         10 5
# # ... with 30 more rows
# #
# # Edge Data: 156 x 3
# from    to     n
# <int> <int> <dbl>
#   1     1     2   198
# 2     1     2   198
# 3     1     3   119
# # ... with 153 more rows

3-4 연결 중심성이 높은 주요 단어 살펴보기

graph_comment %>%
  filter(group == 2) %>%
  arrange(-centrality) %>%
  data.frame()

# name centrality group
# 1     축하         34     2
# 2 아카데미         10     2
# 3     좋다          8     2
# 4     자랑          6     2
# 5     진심          4     2
# 6     대박          4     2
# 7     수상          4     2
# 8   멋지다          4     2
# 9   기쁘다          2     2

# 4 주요 단어가 사용된 원문 살펴보기

news_comment %>%
filter(str_detect(reply, "봉준호") & str_detect(reply, "대박")) %>%
select(reply)

news_comment %>%
filter(str_detect(reply, "박근혜") & str_detect(reply, "블랙리스트")) %>%
select(reply)

news_comment %>%
filter(str_detect(reply, "기생충") & str_detect(reply, "조국")) %>%
select(reply)

'텍스트마이닝' 카테고리의 다른 글

의미망 분석 #5 연이어 사용된 단어쌍 분석 (1) (0)	2022.12.26
의미망 분석 #4 단어 간 상관 분석 (0)	2022.12.26
의미망 분석 #2 동시 출현 네트워크 (1) (0)	2022.12.25
의미망 분석 #1 동시 출현 단어 분석 (0)	2022.12.25
토픽 모델링 #5 최적의 토픽 수 (0)	2022.12.21

'텍스트마이닝' Related Articles

Comments

59doit

의미망 분석 #3 동시 출현 네트워크 (2) 본문

의미망 분석 #3 동시 출현 네트워크 (2)

'텍스트마이닝' 카테고리의 다른 글

티스토리툴바