[Pandas] 넘파이에 의한 랜덤난수

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

59doit

[Pandas] 넘파이에 의한 랜덤난수 본문

Programming/Python(파이썬)

[Pandas] 넘파이에 의한 랜덤난수

yul_S2 2022. 11. 8. 14:01

import pandas as pd
import numpy as np

▷

넘파이에 의한 랜덤난수

frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'),
        index=['Utah', 'Ohio', 'Texas', 'Oregon'])
frame
# <출력>
#                b         d         e
# Utah    0.274992  0.228913  1.352917
# Ohio    0.886429 -2.001637 -0.371843
# Texas   1.669025 -0.438570 -0.539741
# Oregon  0.476985  3.248944 -1.021228
np.abs(frame)

▷

lambda

f = lambda x:x.max() - x.min()
frame.apply(f)
# <출력>
# b    1.394034
# d    5.250581
# e    2.374144
# dtype: float64

▷

axis=1 열방향(좌>우)로 추출 ; axis='columns'

frame.apply(f,axis='columns')
# <출력> Utah      1.124004
# Ohio      2.888067
# Texas     2.208767
# Oregon    4.270171
# dtype: float64

# axis=1 열방향(좌>우)
# axis=0 행방향 (위>아래)로 추출
# axis 연산을 수행할 축, DataFrame 에서 0 은 로우고 1 은 컬럼

▷

소수점 2로 적용 > 적용은 컬럼 위주로

format = lambda x : '%2f' % x    
frame.applymap(format)
# <출력> 
#                b          d          e
# Utah    0.274992   0.228913   1.352917
# Ohio    0.886429  -2.001637  -0.371843
# Texas   1.669025  -0.438570  -0.539741
# Oregon  0.476985   3.248944  -1.021228

frame['e'].map(format)
# <출력> 
# Utah       1.352917
# Ohio      -0.371843
# Texas     -0.539741
# Oregon    -1.021228
# Name: e, dtype: object

sort

obj = pd.Series(range(4),index=['d','a','b','c'])
obj.sort_index()
# <출력>
# a    1
# b    2
# c    3
# d    0
# dtype: int64

▷

로우 컬럼 축 기준으로 정렬

frame=pd.DataFrame(np.arange(8).reshape((2,4)),
                   index=['three','one'],columns=['d','a','b','c'])
frame
# <출력>
#        d  a  b  c
# three  0  1  2  3
# one    4  5  6  7

▷

axis = 0 ; 행방향 (위>아래)로 추출

frame.sort_index(axis=0)
# <출력>
#        d  a  b  c
# one    4  5  6  7
# three  0  1  2  3

▷

axis = 1 ; 열방향 (좌>우)

frame.sort_index(axis=1)
# <출력>        
#        a  b  c  d
# three  1  2  3  0
# one    5  6  7  4

▷

내림차순으로 정렬

frame.sort_index(axis=1,ascending=False)   
# <출력> 
#        d  c  b  a
# three  0  3  2  1
# one    4  7  6  5

▷

객체 정렬 _ sort_values 사용

obj = pd.Series([4, 7, -3, 2])
obj.sort_values()
# <출력>
# 2   -3
# 3    2
# 0    4
# 1    7
# dtype: int64

▷

비어있는값은 정렬시 맨뒤로

obj = pd.Series([4,np.nan,7,np.nan,-3,2])
obj.sort_values()
# <출력>
# 4   -3.0
# 5    2.0
# 0    4.0
# 2    7.0
# 1    NaN
# 3    NaN
# dtype: float64

▷

하나이상의 컬럼에 있는 값으로 정렬

frame = pd.DataFrame({'b':[4,7,-3,2],'a':[0,1,0,1]})
frame
# <출력>
#    b  a
# 0  4  0
# 1  7  1
# 2 -3  0
# 3  2  1

frame.sort_values(by='b')      
# <출력>
#    b  a
# 2 -3  0
# 3  2  1
# 0  4  0
# 1  7  1

여러개의 컬럼 정렬

frame.sort_values(by=['a','b'])
# <출력>
#    b  a
# 2 -3  0
# 0  4  0
# 3  2  1
# 1  7  1

rank

obj= pd.Series([7,-5,7,4,2,0,4])
obj
# <출력>
# 0    7
# 1   -5
# 2    7
# 3    4
# 4    2
# 5    0
# 6    4
# dtype: int64

▷

obj.rank()        
# 0    6.5
# 1    1.0
# 2    6.5
# 3    4.5
# 4    3.0
# 5    2.0
# 6    4.5
# dtype: float64

객체의 평균값이 나오는게 아니라 순위의 평균값
4등이 두개 있으면 4.5 7등이 두개있으면 6.5

지정컬럼으로 정렬??????

obj.rank(method='first')
# <출력>
# 0    6.0
# 1    1.0
# 2    7.0
# 3    4.0
# 4    3.0
# 5    2.0
# 6    5.0
# dtype: float64

▷

내림차순 순위 정렬 obj.rank(ascending=False)

obj.rank(ascending=False, method='max')
# <출력>
# 0    2.0
# 1    7.0
# 2    2.0
# 3    4.0
# 4    5.0
# 5    6.0
# 6    4.0
# dtype: float64

▷

데이터 프레임 에서는 로우나 컬럼에 대해 순위 결정

frame = pd.DataFrame({'b': [4.3, 7, -3, 2],
                      'a': [0, 1, 0, 1], 
                      'c': [-2, 5, 8, -2.5]})
frame
# <출력>
#      b  a    c
# 0  4.3  0 -2.0
# 1  7.0  1  5.0
# 2 -3.0  0  8.0
# 3  2.0  1 -2.5

frame.rank(axis='columns')
# <출력>
#       b    a    c
# 0  3.0  2.0  1.0
# 1  3.0  1.0  2.0
# 2  1.0  2.0  3.0
# 3  3.0  2.0  1.0

▷

중복색인 허용

obj = pd.Series(range(5), index=['a', 'a', 'b', 'b', 'c'])
obj
# <출력>
# a    0
# a    1
# b    2
# b    3
# c    4
# dtype: int64

유일하지 않고 중복된 값이 있으면 False▼

obj.index.is_unique     
# <출력>False

▷

중복색인이여도 로우 선택 가능

df = pd.DataFrame(np.random.randn(4, 3), index=['a', 'a', 'b', 'b'])
df           
# <출력>
#           0         1         2
# a -1.265934  0.119827 -1.063512
# a  0.332883 -2.359419 -0.199543
# b -1.541996 -0.970736 -1.307030
# b  0.286350  0.377984 -0.753887

df.loc['b']  
# <출력>
#           0         1         2
# b -1.541996 -0.970736 -1.307030
# b  0.286350  0.377984 -0.753887

# a, b 에 해당하는 값 두줄씩 나와도

# 색인에 해당하는 값 다 나옴

'Programming > Python(파이썬)' 카테고리의 다른 글

[Python-Pandas] 파일경로설정 (0)	2022.11.09
[Python-Pandas] 수학메서드 (0)	2022.11.09
[Python] indexing (0)	2022.11.08
[Pandas] arr (0)	2022.11.08
[Pandas] #4 ser (0)	2022.11.08

'Programming/Python(파이썬)' Related Articles

Comments

59doit

[Pandas] 넘파이에 의한 랜덤난수 본문

[Pandas] 넘파이에 의한 랜덤난수

sort

rank

'Programming > Python(파이썬)' 카테고리의 다른 글

티스토리툴바