59doit

[Python-Pandas] data index 본문

Programming/Python(파이썬)

[Python-Pandas] data index

yul_S2 2022. 11. 10. 14:01
반응형

 

import numpy as np
import pandas as pd
pd.options.display.max_rows = 20
np.random.seed(12345)
import matplotlib.pyplot as plt
plt.rc('figure', figsize=(10, 6))
np.set_printoptions(precision=4, suppress=True)

 

data = pd.Series(np.random.randn(9),
                 index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'],
                        [1, 2, 3, 1, 3, 1, 2, 2, 3]])
data
# <출력>
# a  1   -0.204708
#    2    0.478943
#    3   -0.519439
# b  1   -0.555730
#    3    1.965781
# c  1    1.393406
#    2    0.092908
# d  2    0.281746
#    3    0.769023
# dtype: float64

 

data index

data.index
# <출력>
# MultiIndex([('a', 1),
#             ('a', 2),
#             ('a', 3),
#             ('b', 1),
#             ('b', 3),
#             ('c', 1),
#             ('c', 2),
#             ('d', 2),
#             ('d', 3)],
#            )

 

data['b']
# <출력>
# 1   -0.555730
# 3    1.965781
# dtype: float64


data['b':'c']
# <출력>
# b  1   -0.555730
#    3    1.965781
# c  1    1.393406
#    2    0.092908
# dtype: float64


data.loc[['b','d']]
# <출력>
# b  1   -0.555730
#    3    1.965781
# d  2    0.281746
#    3    0.769023
# dtype: float64


data.loc[:,2]
# <출력>
# a    0.478943
# c    0.092908
# d    0.281746
# dtype: float64

# data.loc[:,2] : 첫번째 색인 상관없이

 

 

unstack()

data.unstack()
# <출력>
#           1         2         3
# a -0.204708  0.478943 -0.519439
# b -0.555730       NaN  1.965781
# c  1.393406  0.092908       NaN
# d       NaN  0.281746  0.769023

 

stack() :unstack 의 반대 작업은 stack 메서드로 수행

data.unstack().stack()
# <출력>
# a  1   -0.204708
#    2    0.478943
#    3   -0.519439
# b  1   -0.555730
#    3    1.965781
# c  1    1.393406
#    2    0.092908
# d  2    0.281746
#    3    0.769023
# dtype: float64

 

 

DataFrame 에서는 두 축 모두 계층적 색인을 가질 수 있다.

frame = pd.DataFrame(np.arange(12).reshape((4, 3)),
                     index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
                     columns=[['Ohio', 'Ohio', 'Colorado'],
                              ['Green', 'Red', 'Green']])
frame
# <출력>
#      Ohio     Colorado
#     Green Red    Green
# a 1     0   1        2
#   2     3   4        5
# b 1     6   7        8
#   2     9  10       11

frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame
# <출력>
# state      Ohio     Colorado
# color     Green Red    Green
# key1 key2
# a    1        0   1        2
#      2        3   4        5
# b    1        6   7        8
#      2        9  10       11

 

컬럼의 부분집합을 부분적인 색인으로 접근하는 것도 컬럼에 대한 부분적 색인과 비슷하게 사용 가능

frame['Ohio']
# <출력>
# color      Green  Red
# key1 key2
# a    1         0    1
#      2         3    4
# b    1         6    7
#      2         9   10

 

MultiIndex 는 따로 생성한 다음에 재사용이 가능

MultiIndex = data.index

 

위에서 살펴본 DataFrame 의 컬럼 계층이름은 다음처럼 생성할 수 있다

MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']], names=['state', 'color'])
# <출력>
# MultiIndex([(    'Ohio', 'Green'),
#             (    'Ohio',   'Red'),
#             ('Colorado', 'Green')],
#            names=['state', 'color'])

 

swaplevel ; 

넘겨받은 두 개의 계층 번호나 이름이 뒤바뀐 새로운 객체를 반환(데이터는 불변)

frame.swaplevel('key1', 'key2')
# <출력>
# state      Ohio     Colorado
# color     Green Red    Green
# key2 key1
# 1    a        0   1        2
# 2    a        3   4        5
# 1    b        6   7        8
# 2    b        9  10       11

 

 

 

level=1 ; 위에서 바라보는것

frame.sort_index(level=1)
# <출력>
# state      Ohio     Colorado
# color     Green Red    Green
# key1 key2
# a    1        0   1        2
# b    1        6   7        8
# a    2        3   4        5
# b    2        9  10       11

 

 

level=0; 왼쪽에서 오른쪽으로 바라보는것

frame.swaplevel(0, 1).sort_index(level=0)
# <출력>
# state      Ohio     Colorado
# color     Green Red    Green
# key2 key1
# 1    a        0   1        2
#      b        6   7        8
# 2    a        3   4        5
#      b        9  10       11

sum

 

 

sum

sum

frame.sum(level='key2')
# <출력>
# state  Ohio     Colorado
# color Green Red    Green
# key2
# 1         6   8       10
# 2        12  14       16

frame.sum(level='color', axis=1)
# <출력>
# color      Green  Red
# key1 key2
# a    1         2    1
#      2         8    4
# b    1        14    7
#      2        20   10

 

 

index

frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1),
                      'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'],
                      'd': [0, 1, 2, 0, 1, 2, 3]})
frame
# <출력>
#    a  b    c  d
# 0  0  7  one  0
# 1  1  6  one  1
# 2  2  5  one  2
# 3  3  4  two  0
# 4  4  3  two  1
# 5  5  2  two  2
# 6  6  1  two  3

DataFrame 의 set_index()함수는 하나 이상의 컬럼을 색인으로 하는 새로운 DataFrame 을 생성

frame2 = frame.set_index(['c', 'd'])
frame2
# <출력>
#        a  b
# c   d
# one 0  0  7
#     1  1  6
#     2  2  5
# two 0  3  4
#     1  4  3
#     2  5  2
#     3  6  1

 

 

컬럼을 명시적으로 남겨두지 않으면 DataFrame 에서 삭제된다.

frame.set_index(['c', 'd'], drop=False)
# <출력>
#        a  b    c  d
# c   d
# one 0  0  7  one  0
#     1  1  6  one  1
#     2  2  5  one  2
# two 0  3  4  two  0
#     1  4  3  two  1
#     2  5  2  two  2
#     3  6  1  two  3

 

reset_index()함수는 set_index()와 반대되는 개념 ;계층적 색인 단계가 컬럼으로 이동

frame2.reset_index()
# <출력>
#      c  d  a  b
# 0  one  0  0  7
# 1  one  1  1  6
# 2  one  2  2  5
# 3  two  0  3  4
# 4  two  1  4  3
# 5  two  2  5  2
# 6  two  3  6  1

 

반응형

'Programming > Python(파이썬)' 카테고리의 다른 글

[Python-Numpy] -combining #2  (0) 2022.11.11
[Python-Pandas] combining #1  (0) 2022.11.11
[Python-Pandas] 치환  (0) 2022.11.10
[Python-Pandas] 결측치  (0) 2022.11.10
[Python-Pandas] csv 파일 불러오기  (0) 2022.11.09
Comments