run before you can walk

0%

pyODPS 用法

记录使用Python连接和分析ODPS数据的方法。

安装pyODPS

使用pip安装PyODPS

1
pip install pyodps[full]

基本操作

创建数据源连接:

1
2
3
from odps import ODPS
o = ODPS('**your-access-id**', '**your-secret-access-key**',
project='**your-project**', endpoint='**your-end-point**')

获取表数据

1
2
3
4
5
6
7
8
9
from odps import ODPS

o = ODPS('**your-access-id**', '**your-secret-access-key**',
project='**your-project**', endpoint='**your-end-point**')

df = o.read_table('test_table', partition='pt=test')

# 转换为pandas dataframe
pd_df = df.to_pandas()

使用多线程加速获取表数据:

1
2
3
4
5
import multiprocessing
n_process = multiprocessing.cpu_count()
t = o.get_table('dual')
with t.open_reader(partition='pt=test') as reader:
pd_df = reader.to_pandas(n_process=n_process)