[ R ] 연관분석 #2 시각화

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

59doit

[ R ] 연관분석 #2 시각화 본문

통계기반 데이터분석

[ R ] 연관분석 #2 시각화

yul_S2 2022. 12. 4. 14:10

(3) 연관규칙 시각화

arules패키지에서 제공되는 내장 데이터 Adult를 이용하여 연관규칙을 생성하고 유사한 연관규칙끼리 네트워크 형태로 시각화

연관분석과 관련된 패키지를 가지고 있음

ex) Adult 데이터 셋 가져오기

data(Adult)
Adult
# transactions in sparse format with
# 48842 transactions (rows) and
# 115 items (columns)

ex) AdultUCI 데이터 셋 보기

data("AdultUCI")
str(AdultUCI)

ex) Adult 데이터 셋의 요약통계량 보기

#1 data.frame형식으로 보기

adult <- as(Adult, "data.frame")
str(adult)
head(adult)

#2 요약통계량

summary(Adult)

ex) 지지도 10%와 신뢰도 80%가 적용된 연관규칙 발견 6137개

ar <- apriori(Adult, parameter = list(supp = 0.1, conf = 0.8))

# Apriori
#
# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target  ext
# 0.8    0.1    1 none FALSE            TRUE       5     0.1      1     10  rules TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 4884
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[115 item(s), 48842 transaction(s)] done [0.04s].
# sorting and recoding items ... [31 item(s)] done [0.01s].
# creating transaction tree ... done [0.03s].
# checking subsets of size 1 2 3 4 5 6 7 8 9 done [0.11s].
# writing ... [6137 rule(s)] done [0.01s].
# creating S4 object  ... done [0.01s].

ex) 다양한 신뢰도와 지지도를 적용한 예

#1 지지도를 20%로 높인 경우 1,306개 규칙 발견

ar1 <- apriori(Adult, parameter = list(supp = 0.2))

# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target
# 0.8    0.1    1 none FALSE            TRUE       5     0.2      1     10  rules
# ext
# TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 9768
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[115 item(s), 48842 transaction(s)] done [0.04s].
# sorting and recoding items ... [18 item(s)] done [0.01s].
# creating transaction tree ... done [0.02s].
# checking subsets of size 1 2 3 4 5 6 7 done [0.01s].
# writing ... [1306 rule(s)] done [0.00s].
# creating S4 object  ... done [0.00s].

#2 지지도를 20%, 신뢰도 95%로 높인 경우 348개 규칙 발견 (#1 에서 신뢰도를 더 높임)

ar1 <- apriori(Adult, parameter = list(supp = 0.2, conf = 0.95 ))

# Apriori
#
# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target
# 0.95    0.1    1 none FALSE            TRUE       5     0.2      1     10  rules
# ext
# TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 9768
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[115 item(s), 48842 transaction(s)] done [0.04s].
# sorting and recoding items ... [18 item(s)] done [0.01s].
# creating transaction tree ... done [0.03s].
# checking subsets of size 1 2 3 4 5 6 7 done [0.01s].
# writing ... [348 rule(s)] done [0.00s].
# creating S4 object  ... done [0.00s].

#3 지지도를 30%, 신뢰도 95%로 높인 경우 124개 규칙 발견 (#2 에서 지지도를 더 높임)

ar3 <- apriori(Adult, parameter = list(supp = 0.3, conf = 0.95))

# Apriori
#
# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target
# 0.95    0.1    1 none FALSE            TRUE       5     0.3      1     10  rules
# ext
# TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 14652
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[115 item(s), 48842 transaction(s)] done [0.04s].
# sorting and recoding items ... [14 item(s)] done [0.01s].
# creating transaction tree ... done [0.03s].
# checking subsets of size 1 2 3 4 5 6 done [0.00s].
# writing ... [124 rule(s)] done [0.00s].
# creating S4 object  ... done [0.00s].

#4 지지도를 35%, 신뢰도 95%로 높인 경우 67개 규칙 발견 (#3 에서 지지도를 더 높임)

ar4 <- apriori(Adult, parameter = list(supp = 0.35, conf = 0.95))

# Apriori
#
# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target
# 0.95    0.1    1 none FALSE            TRUE       5    0.35      1     10  rules
# ext
# TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 17094
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[115 item(s), 48842 transaction(s)] done [0.04s].
# sorting and recoding items ... [11 item(s)] done [0.01s].
# creating transaction tree ... done [0.03s].
# checking subsets of size 1 2 3 4 5 done [0.00s].
# writing ... [67 rule(s)] done [0.00s].
# creating S4 object  ... done [0.00s].

#5 지지도를 40%, 신뢰도 95%로 높인 경우 36개 규칙 발견 (#4 에서 지지도를 더 높임)

ar5 <- apriori(Adult, parameter = list(supp = 0.4, conf = 0.95))

# Apriori
#
# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen target
# 0.95    0.1    1 none FALSE            TRUE       5     0.4      1     10  rules
# ext
# TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 19536
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[115 item(s), 48842 transaction(s)] done [0.04s].
# sorting and recoding items ... [11 item(s)] done [0.01s].
# creating transaction tree ... done [0.03s].
# checking subsets of size 1 2 3 4 5 done [0.00s].
# writing ... [36 rule(s)] done [0.00s].
# creating S4 object  ... done [0.00s].

ex) 규칙 결과 보기

#1 상위 6개 규칙 보기

inspect(head(ar5))
# lhs rhs support confidence coverage     lift count
# [1] {} => {capital-loss=None} 0.9532779  0.9532779 1.0000000 1.000000 46560
# [2] {relationship=Husband} => {marital-status=Married-civ-spouse} 0.4034233  0.9993914 0.4036690 2.181164 19704
# [3] {relationship=Husband} => {sex=Male} 0.4036485  0.9999493 0.4036690 1.495851 19715
# [4] {age=Middle-aged} => {capital-loss=None} 0.4800786  0.9504276 0.5051185 0.997010 23448
# [5] {income=small} => {capital-gain=None} 0.4849310  0.9581311 0.5061218 1.044414 23685
# [6] {income=small} => {capital-loss=None} 0.4908480  0.9698220 0.5061218 1.017355 23974

어떤조건을 가질때 어떤 결과가 나오는지 규칙 확인

#2 confidence(신뢰도)기준 내림차순 정렬 상위 6개 출력

inspect(head(sort(ar5,decreasing=T,by="confidence")))

# lhs rhs support   confidence coverage lift count
# [1] {relationship=Husband} => {sex=Male} 0.4036485 0.9999493  0.4036690 1.495851 19715
# [2] {marital-status=Married-civ-spouse, relationship=Husband} => {sex=Male} 0.4034028 0.9999492  0.4034233 1.495851 19703
# [3] {relationship=Husband} => {marital-status=Married-civ-spouse} 0.4034233 0.9993914  0.4036690 2.181164 19704
# [4] {relationship=Husband, sex=Male} => {marital-status=Married-civ-spouse} 0.4034028 0.9993913  0.4036485 2.181164 19703
# [5] {marital-status=Married-civ-spouse, sex=Male} => {relationship=Husband} 0.4034028 0.9901503  0.4074157 2.452877 19703
# [6] {income=small} => {capital-loss=None} 0.4908480 0.9698220  0.5061218 1.017355 23974

decreasing = T

#3 lift(향상도)기준 내림차순 정렬 상위 6개 출력

inspect(head(sort(ar5, by = "lift")))

   lhs rhs
[1] {marital-status=Married-civ-spouse, sex=Male} => {relationship=Husband}
[2] {relationship=Husband} => {marital-status=Married-civ-spouse}
[3] {relationship=Husband, sex=Male} => {marital-status=Married-civ-spouse}
[4] {relationship=Husband} => {sex=Male}
[5] {marital-status=Married-civ-spouse, relationship=Husband} => {sex=Male}
[6] {income=small} => {capital-gain=None}
    support confidence coverage lift     count
[1] 0.4034028 0.9901503  0.4074157 2.452877 19703
[2] 0.4034233 0.9993914  0.4036690 2.181164 19704
[3] 0.4034028 0.9993913  0.4036485 2.181164 19703
[4] 0.4036485 0.9999493  0.4036690 1.495851 19715
[5] 0.4034028 0.9999492  0.4034233 1.495851 19703
[6] 0.4849310 0.9581311  0.5061218 1.044414 23685

lift(향상도)에서는 decreasing = T 필요 없이 내림차순 정렬이 된다

ex) 연관규칙 시각화

#1 패키지 설치

install.packages("arulesViz")
library(arulesViz)

install.packages("ggraph",type="binary")
library(ggraph)

## ERROR: compilation failed for package ‘ggraph’

해결방법

구글에서 "ERROR: compilation failed for package ‘ggraph’ " 검색 후

https://stackoverflow.com/ 사이트 연결 된 결과로 들어감

<<<< 때때로 CRAN에서 아직 컴파일되지 않은 최신 버전을 사용할 수 있는 경우 소스에서 설치할 것인지 묻는 메시지가 표시됩니다. 기본값은 "예"인 것 같지만 최선의 선택은 아닙니다. 패키지를 컴파일하는 것은 지저분 할 수 있으며 미리 컴파일 된 바이너리를 사용하는 것이 더 쉽습니다 >>>>

<<<< 따라서 install.packages("ggraph", type="binary") 를 사용해 보면 된다는 답변을 확인 할 수 있다.

#2 연관규칙 시각화

plot(ar3, method = "graph", control = list(type = "items"))

ex) Groceries 데이터 셋으로 연관분석

#1 Groceries 데이터 셋 가져오기

data("Groceries")
str(Groceries)

Groceries
# transactions in sparse format with
# 9835 transactions (rows) and
# 169 items (columns)

#2 데이터프레임으로 형 변환

Groceries.df <- as(Groceries, "data.frame")
head(Groceries.df)
# items
# 1              {citrus fruit,semi-finished bread,margarine,ready soups}
# 2                                        {tropical fruit,yogurt,coffee}
# 3                                                          {whole milk}
# 4                         {pip fruit,yogurt,cream cheese ,meat spreads}
# 5 {other vegetables,whole milk,condensed milk,long life bakery product}
# 6                      {whole milk,butter,yogurt,rice,abrasive cleaner}

#3 지지도 0.001, 신뢰도 0.8 적용 규칙 발견

rules <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8))
# Apriori
#
# Parameter specification:
#   confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
# 0.8    0.1    1 none FALSE            TRUE       5   0.001      1     10
# target  ext
# rules TRUE
#
# Algorithmic control:
#   filter tree heap memopt load sort verbose
# 0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#
# Absolute minimum support count: 9
#
# set item appearances ...[0 item(s)] done [0.00s].
# set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
# sorting and recoding items ... [157 item(s)] done [0.00s].
# creating transaction tree ... done [0.00s].
# checking subsets of size 1 2 3 4 5 6 done [0.01s].
# writing ... [410 rule(s)] done [0.00s].
# creating S4 object  ... done [0.00s].

#4 규칙을 구성하는 왼쪽(LHS) → 오른쪽(RHS)의 item 빈도수 보기 규칙의 표현 A(LHS) → B(RHS)

plot(rules, method = "grouped")

ex) 최대 길이가 3 이하인 규칙 생성

rules <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.80, maxlen = 3))

규칙을 구성하는 LHS와 RHS 길이를 합쳐서 3이하의 길이를 갖는 규칙 생성

ex) Confidence(신뢰도)기준 내림차순으로 규칙 정렬

rules <- sort(rules, decreasing = T, by = "confidence")
inspect(rules)
# lhs                                             rhs                support     confidence coverage    lift      count
# [1]  {rice, sugar}                                => {whole milk}       0.001220132 1.0000000  0.001220132  3.913649 12
# [2]  {canned fish, hygiene articles}              => {whole milk}       0.001118454 1.0000000  0.001118454  3.913649 11
# [3]  {whipped/sour cream, house keeping products} => {whole milk}       0.001220132 0.9230769  0.001321810  3.612599 12
# [4]  {rice, bottled water}                        => {whole milk}       0.001220132 0.9230769  0.001321810  3.612599 12
# [5]  {soups, bottled beer}                        => {whole milk}       0.001118454 0.9166667  0.001220132  3.587512 11
# [6]  {grapes, onions}                             => {other vegetables} 0.001118454 0.9166667  0.001220132  4.737476 11
# [7]  {hard cheese, oil}                           => {other vegetables} 0.001118454 0.9166667  0.001220132  4.737476 11
# [8]  {curd, cereals}                              => {whole milk}       0.001016777 0.9090909  0.001118454  3.557863 10
# [9]  {pastry, sweet spreads}                      => {whole milk}       0.001016777 0.9090909  0.001118454  3.557863 10
# [10] {liquor, red/blush wine}                     => {bottled beer}     0.001931876 0.9047619  0.002135231 11.235269 19
# [11] {oil, mustard}                               => {whole milk}       0.001220132 0.8571429  0.001423488  3.354556 12
# [12] {pickled vegetables, chocolate}              => {whole milk}       0.001220132 0.8571429  0.001423488  3.354556 12
# [13] {pork, butter milk}                          => {other vegetables} 0.001830198 0.8571429  0.002135231  4.429848 18
# [14] {meat, margarine}                            => {other vegetables} 0.001728521 0.8500000  0.002033554  4.392932 17
# [15] {domestic eggs, rice}                        => {whole milk}       0.001118454 0.8461538  0.001321810  3.311549 11
# [16] {butter, jam}                                => {whole milk}       0.001016777 0.8333333  0.001220132  3.261374 10
# [17] {butter, rice}                               => {whole milk}       0.001525165 0.8333333  0.001830198  3.261374 15
# [18] {yogurt, rice}                               => {other vegetables} 0.001931876 0.8260870  0.002338587  4.269346 19
# [19] {herbs, shopping bags}                       => {other vegetables} 0.001931876 0.8260870  0.002338587  4.269346 19
# [20] {tropical fruit, herbs}                      => {whole milk}       0.002338587 0.8214286  0.002846975  3.214783 23
# [21] {napkins, house keeping products}            => {whole milk}       0.001321810 0.8125000  0.001626843  3.179840 13
# [22] {onions, butter milk}                        => {other vegetables} 0.001321810 0.8125000  0.001626843  4.199126 13
# [23] {yogurt, cereals}                            => {whole milk}       0.001728521 0.8095238  0.002135231  3.168192 17
# [24] {hamburger meat, bottled beer}               => {whole milk}       0.001728521 0.8095238  0.002135231  3.168192 17
# [25] {hamburger meat, curd}                       => {whole milk}       0.002541942 0.8064516  0.003152008  3.156169 25
# [26] {turkey, curd}                               => {other vegetables} 0.001220132 0.8000000  0.001525165  4.134524 12
# [27] {herbs, fruit/vegetable juice}               => {other vegetables} 0.001220132 0.8000000  0.001525165  4.134524 12
# [28] {herbs, rolls/buns}                          => {whole milk}       0.002440264 0.8000000  0.003050330  3.130919 24
# [29] {onions, waffles}                            => {other vegetables} 0.001220132 0.8000000  0.001525165  4.134524 12

ex) 발견된 규칙 시각화

library(arulesViz)
plot(rules, method = "graph")

ex) 특정 상품(Item)으로 서브 셋 작성과 시각화

#1 오른쪽 item이 전지분유(whole milk)인 규칙만 서브 셋으로 작성

wmilk <- subset(rules, rhs %in% 'whole milk')
wmilk
# set of 18 rules

inspect(wmilk)
# lhs           rhs support     confidence coverage lift     count
# [1]  {curd, cereals} => {whole milk} 0.001016777 0.9090909  0.001118454 3.557863 10
# [2]  {yogurt, cereals} => {whole milk} 0.001728521 0.8095238  0.002135231 3.168192 17
# [3]  {butter, jam}                          => {whole milk} 0.001016777 0.8333333  0.001220132 3.261374 10
# [4]  {soups, bottled beer}                  => {whole milk} 0.001118454 0.9166667  0.001220132 3.587512 11
# [5]  {napkins, house keeping products} => {whole milk} 0.001321810 0.8125000  0.001626843 3.179840 13
# [6]  {whipped/sour cream, house keeping products} => {whole milk} 0.001220132 0.9230769  0.001321810 3.612599 12
# [7]  {pastry, sweet spreads}                                       => {whole milk} 0.001016777 0.9090909  0.001118454 3.557863 10
# [8]  {rice, sugar}                           => {whole milk} 0.001220132 1.0000000  0.001220132 3.913649 12
# [9]  {butter, rice}                       => {whole milk} 0.001525165 0.8333333  0.001830198 3.261374 15
# [10] {domestic eggs, rice}                                         => {whole milk} 0.001118454 0.8461538  0.001321810 3.311549 11
# [11] {rice, bottled water}                 => {whole milk} 0.001220132 0.9230769  0.001321810 3.612599 12
# [12] {oil, mustard}                        => {whole milk} 0.001220132 0.8571429  0.001423488 3.354556 12
# [13] {canned fish, hygiene articles}      => {whole milk} 0.001118454 1.0000000  0.001118454 3.913649 11
# [14] {tropical fruit, herbs}             => {whole milk} 0.002338587 0.8214286  0.002846975 3.214783 23
# [15] {herbs, rolls/buns}                => {whole milk} 0.002440264 0.8000000  0.003050330 3.130919 24
# [16] {pickled vegetables, chocolate} => {whole milk} 0.001220132 0.8571429  0.001423488 3.354556 12
# [17] {hamburger meat, curd} => {whole milk} 0.002541942 0.8064516  0.003152008 3.156169 25
# [18] {hamburger meat, bottled beer} => {whole milk} 0.001728521 0.8095238  0.002135231 3.168192 17

plot(wmilk, method = "graph")

#2 오른쪽 item이 other vegetables인 규칙만 서브 셋으로 작성

oveg <- subset(rules, rhs %in% 'other vegetables')
oveg
# set of 10 rules

inspect(oveg)
# lhs rhs support     confidence coverage lift     count
# [1]  {turkey, curd} => {other vegetables} 0.001220132 0.8000000  0.001525165 4.134524 12
# [2]  {yogurt, rice} => {other vegetables} 0.001931876 0.8260870  0.002338587 4.269346 19
# [3]  {herbs, fruit/vegetable juice} => {other vegetables} 0.001220132 0.8000000  0.001525165 4.134524 12
# [4]  {herbs, shopping bags}         => {other vegetables} 0.001931876 0.8260870  0.002338587 4.269346 19
# [5]  {grapes, onions} => {other vegetables} 0.001118454 0.9166667  0.001220132 4.737476 11
# [6]  {meat, margarine} => {other vegetables} 0.001728521 0.8500000  0.002033554 4.392932 17
# [7]  {hard cheese, oil} => {other vegetables} 0.001118454 0.9166667  0.001220132 4.737476 11
# [8]  {onions, butter milk} => {other vegetables} 0.001321810 0.8125000  0.001626843 4.199126 13
# [9]  {pork, butter milk} => {other vegetables} 0.001830198 0.8571429  0.002135231 4.429848 18
# [10] {onions, waffles} => {other vegetables} 0.001220132 0.8000000  0.001525165 4.134524 12

plot(oveg, method = "graph")

#3 오른쪽 item이 vegetables 단어가 포함된 규칙만 서브 셋으로 작성

oveg <- subset(rules, rhs %pin% 'vegetables')
oveg
# set of 10 rules

inspect(oveg)
# lhs rhs support     confidence coverage lift     count
# [1]  {turkey, curd} => {other vegetables} 0.001220132 0.8000000  0.001525165 4.134524 12
# [2]  {yogurt, rice} => {other vegetables} 0.001931876 0.8260870  0.002338587 4.269346 19
# [3]  {herbs, fruit/vegetable juice} => {other vegetables} 0.001220132 0.8000000  0.001525165 4.134524 12
# [4]  {herbs, shopping bags}         => {other vegetables} 0.001931876 0.8260870  0.002338587 4.269346 19
# [5]  {grapes, onions} => {other vegetables} 0.001118454 0.9166667  0.001220132 4.737476 11
# [6]  {meat, margarine} => {other vegetables} 0.001728521 0.8500000  0.002033554 4.392932 17
# [7]  {hard cheese, oil} => {other vegetables} 0.001118454 0.9166667  0.001220132 4.737476 11
# [8]  {onions, butter milk} => {other vegetables} 0.001321810 0.8125000  0.001626843 4.199126 13
# [9]  {pork, butter milk} => {other vegetables} 0.001830198 0.8571429  0.002135231 4.429848 18
# [10] {onions, waffles} => {other vegetables} 0.001220132 0.8000000  0.001525165 4.134524 12

plot(oveg, method = "graph")

#4 왼쪽 item이 butter 또는 yogurt인 규칙만 서브 셋으로 작성

butter_yogurt <- subset(rules, lhs %in% c('butter', 'yogurt'))
butter_yogurt
# set of 4 rules

inspect(butter_yogurt)
# lhs rhs support     confidence coverage lift     count
# [1] {yogurt, cereals} => {whole milk} 0.001728521 0.8095238 0.002135231 3.168192 17
# [2] {butter, jam} => {whole milk} 0.001016777 0.8333333  0.001220132 3.261374 10
# [3] {butter, rice} => {whole milk} 0.001525165 0.8333333  0.001830198 3.261374 15
# [4] {yogurt, rice} => {other vegetables} 0.001931876 0.8260870  0.002338587 4.269346 19

plot(butter_yogurt, method = "graph")

연관 네트워크 그래프에서 타원의 크기는 지지도(조합), 색상은 향상도(관련성), 화살표는 상품(item)간의 관계를 나타낸다.

'통계기반 데이터분석' 카테고리의 다른 글

[ R ] xgboost (0)	2022.12.05
[ R ] 연관분석 #1 (0)	2022.12.04
[ R ] 군집분석 #2 (0)	2022.12.03
[ R ] 군집분석 #1 (0)	2022.12.03
[ R ] 오분류표 (1)	2022.12.03

'통계기반 데이터분석' Related Articles

Comments

59doit

[ R ] 연관분석 #2 시각화 본문

[ R ] 연관분석 #2 시각화

'통계기반 데이터분석' 카테고리의 다른 글

티스토리툴바