pandas中dataframe的查询方法（[], loc, iloc, at, iat, ix）

Blog Content

Python 统计学-科学计算 2013-09-12 23:02:37

[]切片方法
使用方括号能够对DataFrame进行切片，有点类似于python的列表切片。按照索引能够实现行选择或列选择或区块选择。

data[1:5]

# 列选择
data[['a', 'b']]

# 区块选择
data[:7][['a', 'b']]
对于多列选择，不能像行选择时一样使用1：5这样的方法来选择。

loc
loc可以让你按照索引来进行行列选择。
data.loc[1:5]
loc与第一种方法不同之处在于会把第5行也选择进去，而第一种方法只会选择到第4行为止。

data.loc[2:4, ['b', 'c']]
loc能够选择在两个特定日期之间的数据，需要注意的是这两个日期必须都要在索引中。

如果没有特殊需求，强烈建议使用loc而尽量少使用[]，因为loc在对DataFrame进行重新赋值操作时会避免chained indexing问题，使用[]时编译器很可能会给出SettingWithCopy的警告。具体可以参见官方文档：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

iloc
如果说loc是按照索引（index）的值来选取的话，那么iloc就是按照索引的位置来进行选取。iloc不关心索引的具体值是多少，只关心位置是多少，所以使用iloc时方括号中只能使用数值。

# 行选择
data[10: 15]
data.iloc[:,[1,2]].head()

# 切片选择
data.iloc[[1,12,34],[0,2]]

at
at的使用方法与loc类似，但是比loc有更快的访问数据的速度，而且只能访问单个元素，不能访问多个元素。
data.at[1,'a']
timeit data.loc[1,'a']
The slowest run took 121.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 192 µs per loop

iat
iat对于iloc的关系就像at对于loc的关系，是一种更快的基于索引位置的选择方法，同at一样只能访问单个元素。
data.iat[1,0]

ix
以上说过的几种方法都要求查询的秩在索引中，或者位置不超过长度范围，而ix允许你得到不在DataFrame索引中的数据。

上一篇：numpy库矩阵数组属性查看：类型、尺寸、形状、维度
下一篇：pandas读取csv文件使用第一列作为索引index_col=0

One - One Code All

Blog Content

pandas中dataframe的查询方法（[], loc, iloc, at, iat, ix）

The minute you think of giving up, think of the reason why you held on so long.