텍스트 & 감정분석 예제 TEST(9)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

59doit

텍스트 & 감정분석 예제 TEST(9) 본문

텍스트 & 감정분석 예제 TEST(9)

yul_S2 2022. 12. 16. 14:08

제공된 데이터에서 빈도수가 2회 이상 단어를 이용하여 단어 구름으로 시각화 하시오

(1) 텍스트 데이터 가져오기

Zelenskydata <- file("C:/젤렌스키_연설문_20220219.txt", encoding = "UTF-8")
zelensky <- readLines(Zelenskydata)
head(zelensky)

(2) 추출된 단어 대상 전처리

(2-1) 필요한 라이브러리

library(multilinguer)
library(KoNLP)
useNIADic()
library(tm)
library(wordcloud)

(2-2) 단어 추출을 위한 사용자 함수 정의하기 & 단어추출

exNouns <- function(x) {paste(extractNoun(as.character(x)), collapse = " ") }

zelensky_nouns <- sapply(zelensky, exNouns)

(2-3) 추출된 단어 대상으로 전처리하기

# 추출된 단어 이용하여 corpus 생성
zelcorpus <- Corpus(VectorSource(zelensky_nouns))

# 데이터 전처리
zelcorpusPrepro <- tm_map(zelcorpus, removePunctuation) # 문장부호 제거
zelcorpusPrepro <- tm_map(zelcorpusPrepro, removeNumbers) # 수치제거
zelcorpusPrepro <- tm_map(zelcorpusPrepro, tolower) # 소문자변경
zelcorpusPrepro <- tm_map(zelcorpusPrepro, removeWords, stopwords('english')) # 불용어제거

# 전처리 결과 확인
inspect(zelcorpusPrepro[1:5])

(2-4) 단어 선별

# 전처리된 결과에서 1~8음절 단어 선정
zelcorpusPrepro_term <-
TermDocumentMatrix(zelcorpusPrepro,
control = list(wordLengths = c(2, 16)))

# matrix 자료구조 -> data.frame 자료구조로 변경
zelTerm_df <- as.data.frame(as.matrix(zelcorpusPrepro_term))

(3) 단어 출현 빈도수 산출

wordfrequency <- sort(rowSums(zelTerm_df), decreasing = TRUE)

frequency_word <- comment %>%
filter(str_count(word) >= 2) %>%
count(sentiment, word, sort = T)

wordfrequency[1:10]

(4) 단어 구름에 디자인 적용 (wordcloud2 패키지 사용)

wordname <- names(wordfrequency)
word.df <- data.frame(word = wordname, freq = wordfrequency)

pal <- brewer.pal(8,"Dark2")

(5) wordcloud2 패키지 사용하여 워드클라우드 결과 제출

install.packages("wordcloud2")
library(wordcloud2)

wordcloud2(word.df)

다음 텍스트를 대상으로 감성분석을 실시하시오.

Itaewon_text1.txt

0.00MB

(1) 단어별로 token화 하시오

1. 파일불러오기

Itaewon_file <- file("C:/Itaewon_text1.txt", encoding = "UTF-8")
Itaewon_text <- readLines(Itaewon_file)
Itaewon_text

2. 구조확인 후 티블 구조로 변환

str(Itaewon_text)
Itaewon_text <- as_tibble(Itaewon_text)
Itaewon_text

3. 필요한 패키지 및 라이브러리

# install.packages("textclean")
library(textclean)
# install.packages("tidytext")
library(tidytext)
library(dplyr)
library(readr)
library(stringr)
library(tidyr)

4. 단어별로 토큰화

Itaewon_word <-  Itaewon_text %>%
  unnest_tokens(input = value,
                output = word,
                token = "words")
Itaewon_word

# # A tibble: 51 x 1
# word
#
#   1 시민들은
# 2 국가가
# 3 참사를
# 4 방치했다고
# 5 입을
# 6 모았다
# 7 집회
# 8 시작
# 9 30
# 10 분
# # ... with 41 more rows

(2) 문장별 감성점수를 산출하시오.

1. 파일불러오기

Itaewon_file <- file("C:/Itaewon_text1.txt", encoding = "UTF-8")
Itaewon_text <- readLines(Itaewon_file)
Itaewon_text

2. 구조확인 후 티블 구조로 변환

str(Itaewon_text)
Itaewon_text <- as_tibble(Itaewon_text)
Itaewon_text

3. 필요한 패키지 및 라이브러리

# install.packages("textclean")
library(textclean)
# install.packages("tidytext")
library(tidytext)
library(dplyr)
library(readr)
library(stringr)
library(tidyr)

4. 문장기준 토큰화

Itaewon_text <- Itaewon_text %>%
  unnest_tokens(input = value,
                output = sentences,
                token = "sentences")
Itaewon_text

Itaewon_df <- Itaewon_text %>%
  unnest_tokens(input = sentences,
                output = word,
                token = "words",
                drop = F)
Itaewon_df

5. 문장별 감정점수 부여

Itaewon_df <- Itaewon_df %>%
  left_join(dic, by = "word") %>%
  mutate(polarity = ifelse(is.na(polarity), 0, polarity))
Itaewon_df

Itaewon_score <- Itaewon_df %>%
  group_by(sentences) %>%
  summarise(score = sum(polarity))

Itaewon_score

# # A tibble: 3 x 2
# sentences                                                   score
# <chr>             <dbl>
# 1 20대 아들을 키우는 입장에서 이 참사를 보고 가만히 있을 수 없었다”며 “안~ -1
# 2 시민들은 국가가 참사를 방치했다고 입을 모았다. 0
# 3 집회 시작 30분 전부터 눈물을 흘리고 있던 이용신씨(55)는 “세월호에 이어 또~     0

'Q.' 카테고리의 다른 글

비교분석 연습문제 (0)	2022.12.19
텍스트 분석 연습문제 (0)	2022.12.16
머신러닝 예제 TEST(8) (0)	2022.12.05
[ R ] 연관분석 연습문제 (0)	2022.12.05
[ R ] 군집분석 연습문제 (0)	2022.12.04

'Q.' Related Articles

Comments

59doit

텍스트 & 감정분석 예제 TEST(9) 본문

텍스트 & 감정분석 예제 TEST(9)

'Q.' 카테고리의 다른 글

티스토리툴바