형태소 분석기를 이용한 빈도분석 ex

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

59doit

형태소 분석기를 이용한 빈도분석 ex 본문

텍스트마이닝

형태소 분석기를 이용한 빈도분석 ex

yul_S2 2022. 12. 17. 12:05

speech_park.txt

0.01MB

박근혜 전 대통령의 대선 출마 선언문이 들어있는 speech_park.txt를 이용해 문제를 해결해 보세요

Q1. speech_park.txt를 불러와 분석에 적합하게 전처리한 다음 연설문에서 명사를 추출하세요.

# 텍스트 불러오기

speechtxt <- file("C:/speech_park.txt",encoding = "UTF-8")
speech <- readLines(speechtxt)
speech

# 전처리

library(dplyr)
library(stringr)
speech_df <- speech %>%
  str_replace_all("[^가-힣]"," ") %>%
  str_squish() %>%
  as_tibble()
speech_df

# 연설문에서 명사 추출

library(tidytext)
library(KoNLP)
word_noun <- speech_df %>% unnest_tokens(input=value, output=word, token=extractNoun)
word_noun

Q2. 가장 자주 사용된 단어 20개를 추출하세요.

top20 <- word_noun %>% count(word,sort=T) %>% filter(str_count(word) > 1) %>% head(20)
top20

Q3. 가장 자주 사용된 단어 20개의 빈도를 나타낸 막대 그래프를 만드세요.

library(ggplot2)
ggplot(top20, aes(x = reorder(word,n), y = n)) +
geom_col() + coord_flip() + geom_text(aes(label = n), hjust = -0.3) + labs(x = NULL)

Q4. 전처리하지 않은 연설문에서 연속된 공백을 제거하고 tibble 구조로 변환한 다음문장 기준으로 토큰화하세요

sentences_speech <- speech %>% str_squish() %>% as_tibble() %>%
unnest_tokens(input=value, output = sentence, token = "sentences")
sentences_speech

Q5. 연설문에서 "경제"가 사용된 문장을 출력하세요.

sentences_speech %>% filter(str_detect(sentence,"경제"))
sentences_speech

'텍스트마이닝' 카테고리의 다른 글

비교분석 #2 오즈비 (0)	2022.12.18
비교분석 #1 (0)	2022.12.18
형태소 분석 (0)	2022.12.17
감정분석 #4 감정사전 수정하기 (0)	2022.12.15
감정분석 #3 감정 범주별 빈도 구하기 (0)	2022.12.15

'텍스트마이닝' Related Articles

Comments

59doit

형태소 분석기를 이용한 빈도분석 ex 본문

형태소 분석기를 이용한 빈도분석 ex

'텍스트마이닝' 카테고리의 다른 글

티스토리툴바