阿里云升级freebsd10.1到11.0-release-p8记录

阿里云升级freebsd10.1到11.0-release-p8记录

总体是一次顺利的升级体验!

主要用到的命令是:

查看版本

945  22:20   freebsd-version -k -u

设置环境变量,10.3版本,好像不需要弄这个:

946  22:20   setenv UNAME_r “10.3-RELEASE”

更新,这个在国内如果没有镜像的话时间超长!

 

为了装jupyter,使用了其它一下命令:

947  22:20   freebsd-update fetch

 

920  20:59   whereis jupyterhub

921  21:00   whereis jupyter

926  21:00   locate notejs

927  21:21   df

928  21:25   whereis nodejs

929  21:25   locate node

930  21:27   pkg install node

931  21:27   pkg-static install node

932  21:55   pkg_stick install python36

933  21:56   pkg-static install python36

934  22:02   whereis py-pyzmq

935  22:03   make PYTHON_VERSION=python3.6 install clean

936  22:03   cd /usr/ports/net/py-pyzmq

937  22:03   make PYTHON_VERSION=python3.6 install clean

938  22:07   pkg-static install py-setuptools36

939  22:07   pkg-static install devel/py-setuptools36

940  22:08   cd /usr/ports/net/py-pyzmq

943  22:09   make -V PYTHON_VERSION=python3.6 install clean

944  22:19   freebsd-version

945  22:20   freebsd-version -k -u

946  22:20   setenv UNAME_r “10.3-RELEASE”

后来知道,这里不需要设这个环境变量。

 

956  22:29   /usr/local/bin/python3.6

957  22:31   ps -aux

958  22:31   nameserver

959  22:32   nslookup

960  22:32   vi /etc/resolv.conf

 

983  22:36   freebsd-update fetch

984  22:36   freebsd-update fetch &

 

1000  22:43   freebsd-update -s ‘freebsd-updates.mirrors.163.com’ fetch

可惜163的镜像没有了。

参考的这篇文章:

升级FreeBSD 10.2-STABLE 到 11.0-RELEASE

https://bbs.aliyun.com/read/297189.html?spm=5176.bbsr296915.0.0.z411Uy

发现那篇文章中有误,

不过那个错误不影响大局,因为到了后面会有提示

freebsd-update fetch 之后,就可以先升级到10.3了

freebsd-update upgrade -r 10.3-RELEASE

 

root@iZ25alqsdzzZ:~ # freebsd-update upgrade -r 10.3-RELEASE

Looking up update.FreeBSD.org mirrors… 4 mirrors found.

Fetching metadata signature for 10.1-RELEASE from update5.freebsd.org… done.

Fetching metadata index… done.

Fetching 2 metadata files… done.

Inspecting system… done.

 

The following components of FreeBSD seem to be installed:

kernel/generic world/base world/lib32

 

The following components of FreeBSD do not seem to be installed:

src/src world/doc world/games

 

Does this look reasonable (y/n)? y

 

Fetching metadata signature for 10.3-RELEASE from update5.freebsd.org… done.

Fetching metadata index… done.

Fetching 1 metadata patches. done.

Applying metadata patches… done.

Fetching 1 metadata files…

done.

Inspecting system…

 

done.

Fetching files from 10.1-RELEASE for merging… done.

Preparing to download files… done.

Fetching 11045 patches…..10….20….30….40….50….60….70….80….90….100….110….120….130….140….150….160….170….180….190….200….210….220….230….240….250….260….270..

 

本来以为要3个小时呢,后来很快:

….10010….10020….10030….10040….10050……..11030….11040.. done.

Applying patches… done.

Fetching 393 files… done.

Attempting to automatically merge changes in files… done.

 

The following file could not be merged automatically: /etc/ntp.conf

Press Enter to edit this file in vi and resolve the conflicts

manually…

 

说/etc/ntp.conf无法自动合并,只能手工上!

 

回答了一大堆yes

然后运行安装:

/usr/sbin/freebsd-update install

 

root@iZ25alqsdzzZ:~ #/usr/sbin/freebsd-update install

Installing updates…

Kernel updates have been installed.  Please reboot and run

“/usr/sbin/freebsd-update install” again to finish installing updates.

 

一年多没重启了,重启一下

启动后看下:

root@rich:~ # freebsd-version -k -u

10.3-RELEASE-p11

10.1-RELEASE

 

Ok,成功从10.1升级到10.3,现在开始主版本升级,从10升级到11,输入如下命令:

# : > /usr/bin/bspatch

# freebsd-update upgrade -r 11.0-RELEASE

# freebsd-update install

<reboot the system>

# freebsd-update install

<rebuild third-party software>

# freebsd-update install

 

root@rich:~ # freebsd-update upgrade -r 11.0-RELEASE

src component not installed, skipped

Looking up update.FreeBSD.org mirrors… 4 mirrors found.

Fetching metadata signature for 10.3-RELEASE from update5.freebsd.org… done.

Fetching metadata index… done.

Fetching 1 metadata patches. done.

Applying metadata patches… done.

Fetching 1 metadata files… done.

Inspecting system… done.

 

The following components of FreeBSD seem to be installed:

kernel/generic world/base world/lib32

 

The following components of FreeBSD do not seem to be installed:

world/doc world/games

 

Does this look reasonable (y/n)? y

 

Fetching metadata signature for 11.0-RELEASE from update5.freebsd.org… done.

Fetching metadata index… done.

Fetching 1 metadata patches. done.

Applying metadata patches… done.

Fetching 1 metadata files… done.

Inspecting system… done.

Fetching files from 10.3-RELEASE for merging… done.

Preparing to download files… done.

Fetching 11218 patches…..10….20….30….40….50….60….70….80….90….100….110….120….130….140….150….160….170….180….190….200….210….220….230….240….250….260….270….280….290….300….310….320….330….340….350….360….370….380….390….400….410….420….430….440….450….460….470….480….490….500……….11200….11210…. done.

Applying patches… done.

Fetching 1645 files… done.

Attempting to automatically merge changes in files… done.

 

The following file could not be merged automatically: /etc/ntp.conf

Press Enter to edit this file in vi and resolve the conflicts

 

安装过程中,又出现ntp.conf,我直接确认退出,然后出现:

The following changes, which occurred between FreeBSD 10.3-RELEASE and

FreeBSD 11.0-RELEASE have been merged into /etc/group:

— current version

+++ new version

@@ -1,6 +1,6 @@

-# $FreeBSD: releng/10.3/etc/group 256366 2013-10-12 06:08:18Z rpaulo $

+# $FreeBSD: releng/11.0/etc/group 294896 2016-01-27 06:28:56Z araujo $

#

wheel:*:0:root,sky

daemon:*:1:

kmem:*:2:

sys:*:3:

@@ -15,10 +15,11 @@

staff:*:20:

sshd:*:22:

smmsp:*:25:

mailnull:*:26:

guest:*:31:

+video:*:44:

bind:*:53:

unbound:*:59:

proxy:*:62:

authpf:*:63:

_pflogd:*:64:

@@ -26,10 +27,11 @@

uucp:*:66:

dialer:*:68:

network:*:69:

audit:*:77:

www:*:80:

+_ypldap:*:160:

hast:*:845:

nogroup:*:65533:

nobody:*:65534:

mysql:*:88:

sky:*:1001:

Does this look reasonable (y/n)?

 

更新了一大堆东西,pkg这个是我真实目的,因为阿里云freebsd10.1的pkg挂了:

/usr/sbin/periodic

/usr/sbin/pkg

/usr/sbin/pmcannotate

 

最后出现:

/var/yp/Makefile.dist

To install the downloaded upgrades, run “/usr/sbin/freebsd-update install”.

 

按照提示运行

/usr/sbin/freebsd-update install

 

root@rich:~ #/usr/sbin/freebsd-update install

src component not installed, skipped

Installing updates…

Kernel updates have been installed.  Please reboot and run

“/usr/sbin/freebsd-update install” again to finish installing updates.

 

重启系统后,看一下:

root@rich:~ # uname -a

FreeBSD rich 11.0-RELEASE-p8 FreeBSD 11.0-RELEASE-p8 #0: Wed Feb 22 06:12:04 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

root@rich:~ # freebsd-version -k -u

11.0-RELEASE-p8

10.3-RELEASE-p17

 

呵呵,已经是11啦!

再来一次/usr/sbin/freebsd-update install

root@rich:~ # /usr/sbin/freebsd-update install

src component not installed, skipped

Installing updates…

Completing this upgrade requires removing old shared object files.

Please rebuild all installed 3rd party software (e.g., programs

installed from the ports tree) and then run “/usr/sbin/freebsd-update install”

again to finish installing updates.

root@rich:~ #

 

这里让重新build所有的三方软件,天啊!

这个先不管它了

先看一下版本:

root@rich:~ # freebsd-version -k -u

11.0-RELEASE-p8

11.0-RELEASE-p8

 

嗯,这样就对了 !

 

总体来说,目前看算顺利,没有出什么内核无法启动,sshd无法启动,网站挂了等幺蛾子 !

patsy学习笔记——概述 zt

patsy学习笔记——概述

patsy是一个Python包,用于描述统计模型(statistical models)(特别是,线性模型或者有线性成分的模型),同时也用于构建设计矩阵(design matrices)。该包的开发受到了R或S语言中的公式微语言(the formula mini-language)的启发,并与其兼容。

举例来说,如果我们有变量y和变量 x,a,b。我们想求出变量y与变量x,a,b之间的回归关系,其中变量a和b之间存在着交互作用,则公式可写为:

patsy.dmatrices("y ~ x + a + b + a:b", data)

树莓派 freebsd相关信息

raspberrypi-running-freebsdarm

freebsd下如何安装摄像头

 

freebsd 树莓派 wiki

安装系统对我来说不是问题,关键是后面,怎么安那些驱动啥的啊!

# uname -a
FreeBSD rpi2 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r292413: Fri Dec 18 11:16:56 UTC 2015     root@releng2.nyi.freebsd.org:/usr/obj/arm.armv6/usr/src/sys/RPI2  arm
root@rp

2016.1.19日补充:

顺便把家里路由器的dns换掉,从8.8.8.8 换成
202.102.152.3
202.102.154.3

这也是没有办法啊,因为经常网络不通!

 

树莓派那块,现在把摄像头装好了,中间有曲折,估计是接触不好的原因。

安装方法如下:

Setting up the Raspberry Pi Camera

  • Install the Raspberry Pi Userland tools:
    • pkg install raspberrypi-userland (misc/raspberrypi-userland)

  • Enable the video core and up the RAM allocated to the GPU, by adding the following to /boot/msdos/CONFIG.TXT:
    • start_x=1

    • gpu_mem=128

  • Reboot
  • Test:
    • raspistill -t 100 -n -o snap-`date +%Y%m%d-%H%M%S`.jpg

 

 

龙虎榜统计

# -*- coding: utf-8 -*-
“””
Created on Tue Sep 15 21:50:19 2015

@author: Administrator
“””

import tushare as ts

aa=ts.inst_detail()
print(“机构买入、卖出信息统计”)
print(aa[:50])

手工检查,发现今天的数据是32项,所以:

bb=aa[:32]

bb=bb.drop_duplicates(‘code’)

bb
Out[33]:
code   name        date   bamount   samount  0   300432   富临精工  2015-10-28   5789.10    897.93
1   000566   海南海药  2015-10-28    101.18      0.00

 

d=bb[‘bamount’].sum()

e=bb[‘samount’].sum()

d

Out[40]: 67145.9

e
Out[41]: 109086.06999999999

得出统计:

10.28日机构买入67145.9万
10.28日机构卖出109086.07万

tushare 使用通联数据

st = ts.Market()

df = st.MktEqud(tradeDate=’20150917′, field=’ticker,secShortName,preClosePrice,openPrice,highestPrice,lowestPrice,closePrice,turnoverVol,turnoverRate’)

df[‘ticker’] = df[‘ticker’].map(lambda x: str(x).zfill(6))

df
Out[23]:
ticker secShortName  preClosePrice  openPrice  highestPrice  0     000001         平安银行         10.900     10.850        11.140
1     000002          万科A         13.430     13.350        13.390
2     000004         国农科技         26.500     26.260        27.350

 

Web Plotting Python金融分析学习

练习了2个数据源的,分别是yahoo和tushare的

 

import numpy as np

import pandas as pd

url=’http://ichart.yahoo.com/table.csv?s=MSFT&a=0&b=1&c=2009′
data=pd.read_csv(url,parse_dates=[‘Date’])

data.plot(x=”Date”,y=’Close’)

就能看到微软2009年的走势图了

 

第二种方法是从tushare获取数据:

data1=ts.get_hist_data(‘sh’)

但是这里获取的数据没有常规的日期列,需要把索引复制到日期列里:

data1[‘Date’]=data1.index

data1.plot(x=’Date’,y=’close’)

这样就能看到近3年的上证走势图了!

 

顺便学习了Bokeh绘图,好像这个挺推荐的:

可以直接输出网页,可以实时绘图!

from bokeh.plotting import figure, output_file, show

# prepare some data
x = [1, 2, 3, 4, 5]
y = [6, 7, 2, 4, 5]

# output to static HTML file
output_file("lines.html", title="line plot example")

# create a new plot with a title and axis labels
p = figure(title="simple line example", x_axis_label='x', y_axis_label='y')

# add a line renderer with legend and line thickness
p.line(x, y, legend="Temp.", line_width=2)

# show the results
show(p)

真的需要盲目烧钱追求大数据吗? zt

http://news.xinhuanet.com/info/2013-05/16/c_132386147.htm

大数据可能是现在最炙手可热的技术名词了。热就意味着有泡沫,有值得反思的地方。Quartz的Christopher Mims 5月6日发表了一篇文章,名为“大多数数据都不大,假装大数据其实是瞎浪费钱”,有理有据,推荐一读。以下为译文:

  如果你现在还没有加入大数据的阵营, 那你想办法弄到一些。毕竟, 竞争需要大数据。如果你的数据量很小, 你将被竞争对手彻底打败。

作为顾问和 IT公司向企业推销的另一个大项目,在大数据背后的猜想还存在很多问题。幸运的是,诚实的大数据实践者(又称数据科学家)从不放下怀疑态度, 并提出了一系列对大数据大肆宣传感到厌倦的理由。如下:

理由一,即使像Facebook和Yahoo!这样的互联网巨头也并非总是处理大数据,Google风格工具的应用是不合适的

Facebook和雅虎运行其巨型集群机(功能强大的服务器集合)来处理数据。必须要进行集群处理是大数据的标志之一。毕竟,在家用PC就能处理的数据不能称为大数据。将业务拆分为小业务,使用一系列的计算机来处理每个小业务的必要性,是类似Google计算世界上每一个网页排名的大数据问题典型特点。

现在看来,对于Facabook和Yahoo!来说,每个业务都是用同样规模的集群机是不必要的。比如Facebook的情况,工程师提交给集群机的大多数任务都是 MB到GB的范围,完全可以在一台计算机甚至笔记本电脑上完成。

Yahoo!也存在类似的情况, Yahoo!集群机所处理的数据中位数只有 12.5GB,通常台式电脑不能处理这种任务,但一台配置较好的服务器完全可以胜任。

以上观点均提炼于Microsoft Research的一篇名为《 Nobody ever got fired for buying a cluster》的论文。论文中指出即使是在最渴求数据的公司,多数问题也不必集群处理。因为对于大量问题类型而言,集群是一个相对低效 甚至是完全不合适的解决方案。

理由二,大数据已经成为数据分析的代名词,这种定义是混乱的,并会起到反作用。

数据分析最早可追溯到为皇家粮仓的所有粮食制表统计,但是现在你必须要在数据前加“大”字,必要的数据分析已经卷入了一场较大但是用处不大的流行风暴中。例如,一篇文章告诫读者“ 3个步骤将大数据运用到你的小企业中”,其实小企业的数据量谷歌文档就能处理,更不说用笔记本的EXCEL了。

这就是说,实际上大多数企业处理的数据都是被Open Knowledge Foundation的Rufus Pollock所说的小数据。这很重要,这是一场“革命”, Pollock称。但它与大数据关系不大。

理由三,超大化你的数据规模正在变成一件得不偿失的事情

数据越多就越好吗?不尽然。如果你正在寻找相关方程式——x,y的关系,如何能给我提供有效信息?实际上数据越多,随之而来的麻烦也越大。

能从大数据中提取的信息会随着数据规模的增加而减少,Michael Wu(社交媒体分析公司Lithium的首席数据分析学家) 写道。这意味着越过了某一点后,继续增加数据所产生的边际数据回报率减少到如此地步,收集更多数据仅仅是浪费时间。

原因之一:数据越“大”,寻找相关性时错误信息会更多。正如数据分析家Vincent Granville在《 The curse of big data》(《大数据的诅咒》)中写道的:即使只包括1000个条目的数据集,也很容易会陷入处理几百万个相关分析的处境。”这意味着,“所有这些相关分析,有些可能会高度符合,但这仅仅是一种偶然:如果你使用这种相关分析作为预测模型,结果将会错误”。

这个错误经常在大数据的原始应用领域之一遗传学中突然出现。对基因组序列有兴趣的科学家苦心找寻其相关性而进行的无休止的研究,最终却得出了各种毫无益处的结果。

理由四,在某些情况下,大数据会令你茅塞顿开,但也可能会令你陷入困惑。

公司一旦开始使用大数据,就深陷于一系列艰涩学科的研究中——统计,数据质量,和其他构成“数据科学”的一切。就像那些每天都需要发表出版物的科学,经常会被忽视或是被修正,或是从未被证实,这之中的陷阱实在太多了。

数据收集方式的偏见,上下文的缺乏,数据聚集的缺口,数据的人工处理模式和整体认知偏差都会导致即使最好的研究人员也可能发现错误的相关模型, 麻省理工学院媒体实验室客座教授Kate Crawford说:“我们可能会陷入某种算法幻觉中”。换句话说,即使你有大数据,也并非IT部门的任何人都能处理的,他可能需要有博士学位或等量经验。当处理完成后,他们的答案可能是你并不需要“大数据”。

那么哪个更好——大数据或小数据?

你的业务需要数据吗?当然需要。但是只有 尖头发呆伯特的老板才会像赶时髦一样购买具有所谓重要性的数据规模。在科学领域同样存在着企业使用数据制定决策时固有的问题——数据质量,总体目标以及上下文和直觉的重要性。记住:Gregor Mendel仅利用一本笔记本的数据就发现了遗传的秘密。重要是数据的质量,而不是数据的规模。

pandas新手教程之基础篇 zt

python api文档见这里:

https://www.ricequant.com/api/python/chn

Jupyter Notebook社区 策略研究 我的策略 帮助 登出 tutorials_basic Last Checkpoint: Last Wednesday at 3:29 PM (autosaved)
Ipython List策略研究列表
Python 3
File
Edit
View
Insert
Cell
Kernel
Help
Cell Toolbar:
pandas新手教程之基础篇
下面是关于pandas以及其生态系统下的libraries的基础运用展示
基础篇教程分10节,从最基础的Series和DataFrame讲起,直到最后做成最基本的移动均线和标准差,达到可以基本可以使用ricequant上ipython       notebook的级别。

注:一些要调用的libraries,之后会用到
在手打运行参考代码的时候,一定要对齐,一定要对齐,一定要对齐!重要的事情说三遍!(⊙o⊙)
另由于教程的篇幅,所以设置了显示的最长行数31,保证有一个月的数据能够在教程中显示出来
In [8]:

from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import scipy as sp
import statsmodels.tsa.stattools as sts
import matplotlib.pyplot as plt
import statsmodels.api as sm
pd.options.display.max_rows = 31
1:series
Pandas最重要的数据结构有二,Series和DataFrame,Panel暂不涉及。本节主要提到pandas series的一些基本操作
In [9]:

labels = [‘a’,’b’,’c’,’d’,’e’]
s2 = Series(np.random.randn(5),index =labels)
s2
Out[9]:
a   -1.333205
b   -1.049079
c    1.431448
d    0.292383
e    0.918513
dtype: float64
In [6]:

‘b’ in s2
Out[6]:
True
In [4]:

s2[‘b’]
Out[4]:
-0.36129673980263433
to_dict查看词典
In [5]:

mapping = s2.to_dict()
mapping
Out[5]:
{‘a': -0.38272820928427753,
‘b': -0.36129673980263433,
‘c': -0.13586616742241123,
‘d': -1.1435589583111128,
‘e': -2.193229583218169}
通过词典的方式也可以构建Series
In [6]:

Series(mapping)

Out[6]:
a   -0.382728
b   -0.361297
c   -0.135866
d   -1.143559
e   -2.193230
dtype: float64
一次取数据的长度默认为一年,见下,至于为什么是如此的数据结构,结尾彩蛋告诉你~
In [7]:

ts = get_price(‘600208.XSHG’)[‘ClosingPx’][-10:]
ts
Out[7]:
MDEntryDate
2013-12-20    3.17
2013-12-23    3.18
2013-12-24    3.17
2013-12-25    3.16
2013-12-26    3.11
2013-12-27    3.16
2013-12-30    3.17
2013-12-31    3.20
2014-01-02    3.19
2014-01-03    3.13
Name: ClosingPx, dtype: float64
声明了strat_date & end_date 以后才可以进行更个性化的定制,当然了,美股别忘记加上国家’us’
In [8]:

tsspecial = get_price(‘AAPL.US’,’us’, start_date=’2001-04-01′, end_date=’2015-04-12′)[‘ClosingPx’][-90:]
tsspecial
Out[8]:
MDEntryDate
2014-12-01    115.070
2014-12-02    114.630
2014-12-03    115.930
2014-12-04    115.490
2014-12-05    115.000
2014-12-08    112.400
2014-12-09    114.120
2014-12-10    111.950
2014-12-11    111.620
2014-12-12    109.730
2014-12-15    108.225
2014-12-16    106.745
2014-12-17    109.410
2014-12-18    112.650
2014-12-19    111.780

2015-03-20    125.900
2015-03-23    127.210
2015-03-24    126.690
2015-03-25    123.380
2015-03-26    124.240
2015-03-27    123.250
2015-03-30    126.370
2015-03-31    124.430
2015-04-01    124.250
2015-04-02    125.320
2015-04-06    127.350
2015-04-07    126.010
2015-04-08    125.600
2015-04-09    126.560
2015-04-10    127.100
Name: ClosingPx, dtype: float64
从头按顺序取ts前5项
In [9]:

ts[:5]
Out[9]:
MDEntryDate
2013-12-20    3.17
2013-12-23    3.18
2013-12-24    3.17
2013-12-25    3.16
2013-12-26    3.11
Name: ClosingPx, dtype: float64
In [10]:

ts.index
Out[10]:
DatetimeIndex([‘2013-12-20′, ‘2013-12-23′, ‘2013-12-24′, ‘2013-12-25′,
‘2013-12-26′, ‘2013-12-27′, ‘2013-12-30′, ‘2013-12-31′,
‘2014-01-02′, ‘2014-01-03′],
dtype=’datetime64[ns]’, name=’MDEntryDate’, freq=None, tz=None)
确认标签,并且按照标签取值,其实和直接按位置取是等价的。
In [13]:

date = ts.index[6]
date
Out[13]:
Timestamp(‘2013-12-30 00:00:00′)
In [14]:

ts[date]
Out[14]:
3.1699999999999999
In [15]:

ts[6]
Out[15]:
3.1699999999999999
米筐技能:同时读取多只股票,更多米筐技请参考basic_demo.
In [16]:

dfcn = get_price([‘000024.XSHE’, ‘000001.XSHE’, ‘000002.XSHE’])[‘ClosingPx’][-10:]
2:DataFrame
本节介绍关于DataFrame数据结构的一些基本操作
In [17]:

dfus = get_price([‘AAPL.US’,’IBM.US’,’MSFT.US’],country = ‘us’) [‘ClosingPx’][-30:]
dfus
Out[17]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-11-20 73.571429 185.19 37.080
2013-11-21 74.448000 184.13 37.400
2013-11-22 74.257143 181.30 37.570
2013-11-25 74.820000 178.94 37.640
2013-11-26 76.200000 177.31 37.350
2013-11-27 77.994286 178.97 37.600
2013-11-29 79.438571 179.68 38.130
2013-12-02 78.747143 177.48 38.450
2013-12-03 80.903143 176.08 38.310
2013-12-04 80.714286 175.74 38.940
2013-12-05 81.128714 176.08 38.000
2013-12-06 80.002857 177.67 38.360
2013-12-09 80.918571 177.46 38.705
2013-12-10 80.792857 177.12 38.110
2013-12-11 80.194286 175.20 37.610
2013-12-12 80.077143 173.37 37.220
2013-12-13 79.204286 172.80 36.690
2013-12-16 79.642857 177.85 36.885
2013-12-17 79.284286 175.76 36.520
2013-12-18 78.681429 178.70 36.580
2013-12-19 77.780000 180.22 36.250
2013-12-20 78.431429 180.02 36.800
2013-12-23 81.441429 182.23 36.620
2013-12-24 81.095714 183.22 37.080
2013-12-26 80.557143 185.35 37.440
2013-12-27 80.012857 185.08 37.290
2013-12-30 79.217143 186.41 37.290
2013-12-31 80.145714 187.57 37.410
2014-01-02 79.018571 185.53 37.160
2014-01-03 77.282857 186.64 36.910
考察IBM和微软的股价比率
In [18]:

dfus[‘Ratio’] =  dfus[‘IBM.US’] / dfus[‘MSFT.US’]
dfus
Out[18]:
AAPL.US IBM.US MSFT.US Ratio
MDEntryDate
2013-11-20 73.571429 185.19 37.080 4.994337
2013-11-21 74.448000 184.13 37.400 4.923262
2013-11-22 74.257143 181.30 37.570 4.825659
2013-11-25 74.820000 178.94 37.640 4.753985
2013-11-26 76.200000 177.31 37.350 4.747256
2013-11-27 77.994286 178.97 37.600 4.759840
2013-11-29 79.438571 179.68 38.130 4.712300
2013-12-02 78.747143 177.48 38.450 4.615865
2013-12-03 80.903143 176.08 38.310 4.596189
2013-12-04 80.714286 175.74 38.940 4.513097
2013-12-05 81.128714 176.08 38.000 4.633684
2013-12-06 80.002857 177.67 38.360 4.631648
2013-12-09 80.918571 177.46 38.705 4.584937
2013-12-10 80.792857 177.12 38.110 4.647599
2013-12-11 80.194286 175.20 37.610 4.658336
2013-12-12 80.077143 173.37 37.220 4.657980
2013-12-13 79.204286 172.80 36.690 4.709730
2013-12-16 79.642857 177.85 36.885 4.821743
2013-12-17 79.284286 175.76 36.520 4.812705
2013-12-18 78.681429 178.70 36.580 4.885183
2013-12-19 77.780000 180.22 36.250 4.971586
2013-12-20 78.431429 180.02 36.800 4.891848
2013-12-23 81.441429 182.23 36.620 4.976242
2013-12-24 81.095714 183.22 37.080 4.941208
2013-12-26 80.557143 185.35 37.440 4.950588
2013-12-27 80.012857 185.08 37.290 4.963261
2013-12-30 79.217143 186.41 37.290 4.998927
2013-12-31 80.145714 187.57 37.410 5.013900
2014-01-02 79.018571 185.53 37.160 4.992734
2014-01-03 77.282857 186.64 36.910 5.056624
删除多余该列,dfus留着之后用
In [19]:

del dfus[‘Ratio’]
dfus
Out[19]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-11-20 73.571429 185.19 37.080
2013-11-21 74.448000 184.13 37.400
2013-11-22 74.257143 181.30 37.570
2013-11-25 74.820000 178.94 37.640
2013-11-26 76.200000 177.31 37.350
2013-11-27 77.994286 178.97 37.600
2013-11-29 79.438571 179.68 38.130
2013-12-02 78.747143 177.48 38.450
2013-12-03 80.903143 176.08 38.310
2013-12-04 80.714286 175.74 38.940
2013-12-05 81.128714 176.08 38.000
2013-12-06 80.002857 177.67 38.360
2013-12-09 80.918571 177.46 38.705
2013-12-10 80.792857 177.12 38.110
2013-12-11 80.194286 175.20 37.610
2013-12-12 80.077143 173.37 37.220
2013-12-13 79.204286 172.80 36.690
2013-12-16 79.642857 177.85 36.885
2013-12-17 79.284286 175.76 36.520
2013-12-18 78.681429 178.70 36.580
2013-12-19 77.780000 180.22 36.250
2013-12-20 78.431429 180.02 36.800
2013-12-23 81.441429 182.23 36.620
2013-12-24 81.095714 183.22 37.080
2013-12-26 80.557143 185.35 37.440
2013-12-27 80.012857 185.08 37.290
2013-12-30 79.217143 186.41 37.290
2013-12-31 80.145714 187.57 37.410
2014-01-02 79.018571 185.53 37.160
2014-01-03 77.282857 186.64 36.910
基本属性:标签(index)、列(columns)、值(values)
In [20]:

dfus.index
Out[20]:
DatetimeIndex([‘2013-11-20′, ‘2013-11-21′, ‘2013-11-22′, ‘2013-11-25′,
‘2013-11-26′, ‘2013-11-27′, ‘2013-11-29′, ‘2013-12-02′,
‘2013-12-03′, ‘2013-12-04′, ‘2013-12-05′, ‘2013-12-06′,
‘2013-12-09′, ‘2013-12-10′, ‘2013-12-11′, ‘2013-12-12′,
‘2013-12-13′, ‘2013-12-16′, ‘2013-12-17′, ‘2013-12-18′,
‘2013-12-19′, ‘2013-12-20′, ‘2013-12-23′, ‘2013-12-24′,
‘2013-12-26′, ‘2013-12-27′, ‘2013-12-30′, ‘2013-12-31′,
‘2014-01-02′, ‘2014-01-03′],
dtype=’datetime64[ns]’, name=’MDEntryDate’, freq=None, tz=None)
In [21]:

dfus.columns
Out[21]:
Index([‘AAPL.US’, ‘IBM.US’, ‘MSFT.US’], dtype=’object’)
In [22]:

dfus.values
Out[22]:
array([[  73.57142857,  185.19      ,   37.08      ],
[  74.448     ,  184.13      ,   37.4       ],
[  74.25714286,  181.3       ,   37.57      ],
[  74.82      ,  178.94      ,   37.64      ],
[  76.2       ,  177.31      ,   37.35      ],
[  77.99428571,  178.97      ,   37.6       ],
[  79.43857143,  179.68      ,   38.13      ],
[  78.74714286,  177.48      ,   38.45      ],
[  80.90314286,  176.08      ,   38.31      ],
[  80.71428571,  175.74      ,   38.94      ],
[  81.12871429,  176.08      ,   38.        ],
[  80.00285714,  177.67      ,   38.36      ],
[  80.91857143,  177.46      ,   38.705     ],
[  80.79285714,  177.12      ,   38.11      ],
[  80.19428571,  175.2       ,   37.61      ],
[  80.07714286,  173.37      ,   37.22      ],
[  79.20428571,  172.8       ,   36.69      ],
[  79.64285714,  177.85      ,   36.885     ],
[  79.28428571,  175.76      ,   36.52      ],
[  78.68142857,  178.7       ,   36.58      ],
[  77.78      ,  180.22      ,   36.25      ],
[  78.43142857,  180.02      ,   36.8       ],
[  81.44142857,  182.23      ,   36.62      ],
[  81.09571429,  183.22      ,   37.08      ],
[  80.55714286,  185.35      ,   37.44      ],
[  80.01285714,  185.08      ,   37.29      ],
[  79.21714286,  186.41      ,   37.29      ],
[  80.14571429,  187.57      ,   37.41      ],
[  79.01857143,  185.53      ,   37.16      ],
[  77.28285714,  186.64      ,   36.91      ]])
In [23]:

date = dfus.index[5]
dfus.ix[date]
Out[23]:
AAPL.US     77.994286
IBM.US     178.970000
MSFT.US     37.600000
Name: 2013-11-27 00:00:00, dtype: float64
其等价于如下代码,Series和DataFrame在这方面是类似的。
In [24]:

dfus.ix[5]
Out[24]:
AAPL.US     77.994286
IBM.US     178.970000
MSFT.US     37.600000
Name: 2013-11-27 00:00:00, dtype: float64
In [25]:

dfus.ix[5,’IBM.US’]
Out[25]:
178.97
最基本的筛选
In [26]:

dfus.ix[dfus[‘AAPL.US’]>80]
Out[26]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-12-03 80.903143 176.08 38.310
2013-12-04 80.714286 175.74 38.940
2013-12-05 81.128714 176.08 38.000
2013-12-06 80.002857 177.67 38.360
2013-12-09 80.918571 177.46 38.705
2013-12-10 80.792857 177.12 38.110
2013-12-11 80.194286 175.20 37.610
2013-12-12 80.077143 173.37 37.220
2013-12-23 81.441429 182.23 36.620
2013-12-24 81.095714 183.22 37.080
2013-12-26 80.557143 185.35 37.440
2013-12-27 80.012857 185.08 37.290
2013-12-31 80.145714 187.57 37.410
复杂一点,在之前的基础上选取IBM和MSFT两列
In [27]:

dfus.ix[dfus[‘AAPL.US’]>80,[‘IBM.US’,’MSFT.US’]]
Out[27]:
IBM.US MSFT.US
MDEntryDate
2013-12-03 176.08 38.310
2013-12-04 175.74 38.940
2013-12-05 176.08 38.000
2013-12-06 177.67 38.360
2013-12-09 177.46 38.705
2013-12-10 177.12 38.110
2013-12-11 175.20 37.610
2013-12-12 173.37 37.220
2013-12-23 182.23 36.620
2013-12-24 183.22 37.080
2013-12-26 185.35 37.440
2013-12-27 185.08 37.290
2013-12-31 187.57 37.410
赋个空值
In [28]:

dfus.ix[dfus[‘AAPL.US’]>80,[‘IBM.US’,’MSFT.US’]] = np.nan
dfus
Out[28]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-11-20 73.571429 185.19 37.080
2013-11-21 74.448000 184.13 37.400
2013-11-22 74.257143 181.30 37.570
2013-11-25 74.820000 178.94 37.640
2013-11-26 76.200000 177.31 37.350
2013-11-27 77.994286 178.97 37.600
2013-11-29 79.438571 179.68 38.130
2013-12-02 78.747143 177.48 38.450
2013-12-03 80.903143 NaN NaN
2013-12-04 80.714286 NaN NaN
2013-12-05 81.128714 NaN NaN
2013-12-06 80.002857 NaN NaN
2013-12-09 80.918571 NaN NaN
2013-12-10 80.792857 NaN NaN
2013-12-11 80.194286 NaN NaN
2013-12-12 80.077143 NaN NaN
2013-12-13 79.204286 172.80 36.690
2013-12-16 79.642857 177.85 36.885
2013-12-17 79.284286 175.76 36.520
2013-12-18 78.681429 178.70 36.580
2013-12-19 77.780000 180.22 36.250
2013-12-20 78.431429 180.02 36.800
2013-12-23 81.441429 NaN NaN
2013-12-24 81.095714 NaN NaN
2013-12-26 80.557143 NaN NaN
2013-12-27 80.012857 NaN NaN
2013-12-30 79.217143 186.41 37.290
2013-12-31 80.145714 NaN NaN
2014-01-02 79.018571 185.53 37.160
2014-01-03 77.282857 186.64 36.910
也可以用词典的方式构建DataFrame
In [29]:

data = {}
for col in [‘foo’,’bar’,’baz’]:
for row in [‘a’,’b’,’c’,’d’]:
data.setdefault(col,{})[row] =np.random.randn()
data
Out[29]:
{‘bar': {‘a': -0.8332422805433652,
‘b': 0.5994500856951476,
‘c': 0.9537581460796728,
‘d': -0.49437981524535757},
‘baz': {‘a': 0.2766639013497691,
‘b': -0.8861177531221818,
‘c': 0.16701653134374714,
‘d': -1.8695537477196287},
‘foo': {‘a': 0.10167946157142055,
‘b': 0.31454296162293,
‘c': 1.8135929483933937,
‘d': 0.9831684422423665}}
In [30]:

DataFrame(data)
Out[30]:
bar baz foo
a -0.833242 0.276664 0.101679
b 0.599450 -0.886118 0.314543
c 0.953758 0.167017 1.813593
d -0.494380 -1.869554 0.983168
数据调整
接着上面取好的dfus接着往下做~
In [31]:

s1  = dfus[‘AAPL.US’][-20:]
s2  = dfus[‘AAPL.US’][-25:-10]
(s1, s2)
Out[31]:
(MDEntryDate
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
2013-12-19    77.780000
2013-12-20    78.431429
2013-12-23    81.441429
2013-12-24    81.095714
2013-12-26    80.557143
2013-12-27    80.012857
2013-12-30    79.217143
2013-12-31    80.145714
2014-01-02    79.018571
2014-01-03    77.282857
Name: AAPL.US, dtype: float64, MDEntryDate
2013-11-27    77.994286
2013-11-29    79.438571
2013-12-02    78.747143
2013-12-03    80.903143
2013-12-04    80.714286
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
Name: AAPL.US, dtype: float64)
有空值的时候,两个DataFrame相加,值依然为NaN,新的标签则取二者标签的并集
In [32]:

s1+s2
Out[32]:
MDEntryDate
2013-11-27           NaN
2013-11-29           NaN
2013-12-02           NaN
2013-12-03           NaN
2013-12-04           NaN
2013-12-05    162.257429
2013-12-06    160.005714
2013-12-09    161.837143
2013-12-10    161.585714
2013-12-11    160.388571
2013-12-12    160.154286
2013-12-13    158.408571
2013-12-16    159.285714
2013-12-17    158.568571
2013-12-18    157.362857
2013-12-19           NaN
2013-12-20           NaN
2013-12-23           NaN
2013-12-24           NaN
2013-12-26           NaN
2013-12-27           NaN
2013-12-30           NaN
2013-12-31           NaN
2014-01-02           NaN
2014-01-03           NaN
Name: AAPL.US, dtype: float64
输出取得的结果到ipython notebook的平台上,同名的文件一样会被覆盖哦,右键获得的foo.csv文件再点击 save link as即可下载到本地
In [33]:

(s1+s2).to_csv(‘foo.csv’)
取s1、s2交易,内连接
In [34]:

innerjoin= s1.align(s2,join = ‘inner’)
innerjoin
Out[34]:
(MDEntryDate
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
Name: AAPL.US, dtype: float64, MDEntryDate
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
Name: AAPL.US, dtype: float64)
取s1、s2并集,外连接
In [35]:

outerjoin = s1.align(s2,join=’outer’)
outerjoin
Out[35]:
(MDEntryDate
2013-11-27          NaN
2013-11-29          NaN
2013-12-02          NaN
2013-12-03          NaN
2013-12-04          NaN
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
2013-12-19    77.780000
2013-12-20    78.431429
2013-12-23    81.441429
2013-12-24    81.095714
2013-12-26    80.557143
2013-12-27    80.012857
2013-12-30    79.217143
2013-12-31    80.145714
2014-01-02    79.018571
2014-01-03    77.282857
Name: AAPL.US, dtype: float64, MDEntryDate
2013-11-27    77.994286
2013-11-29    79.438571
2013-12-02    78.747143
2013-12-03    80.903143
2013-12-04    80.714286
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
2013-12-19          NaN
2013-12-20          NaN
2013-12-23          NaN
2013-12-24          NaN
2013-12-26          NaN
2013-12-27          NaN
2013-12-30          NaN
2013-12-31          NaN
2014-01-02          NaN
2014-01-03          NaN
Name: AAPL.US, dtype: float64)
右连接
In [36]:

b,c =s1.align(s2,join =’right’)
b,c
Out[36]:
(MDEntryDate
2013-11-27          NaN
2013-11-29          NaN
2013-12-02          NaN
2013-12-03          NaN
2013-12-04          NaN
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
Name: AAPL.US, dtype: float64, MDEntryDate
2013-11-27    77.994286
2013-11-29    79.438571
2013-12-02    78.747143
2013-12-03    80.903143
2013-12-04    80.714286
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
Name: AAPL.US, dtype: float64)
步长为2取dfus中的IBM.US和MSFT.US两列
In [37]:

dfus2 = dfus.ix[::2,[‘IBM.US’,’MSFT.US’]]
dfus2
Out[37]:
IBM.US MSFT.US
MDEntryDate
2013-11-20 185.19 37.08
2013-11-22 181.30 37.57
2013-11-26 177.31 37.35
2013-11-29 179.68 38.13
2013-12-03 NaN NaN
2013-12-05 NaN NaN
2013-12-09 NaN NaN
2013-12-11 NaN NaN
2013-12-13 172.80 36.69
2013-12-17 175.76 36.52
2013-12-19 180.22 36.25
2013-12-23 NaN NaN
2013-12-26 NaN NaN
2013-12-30 186.41 37.29
2014-01-02 185.53 37.16
两者“相加”,在内容上取交集,在范围上取并集,相加后IBM和MSFT两列也只有九个非NaN,和dfus一致 这两列其他的NaN是由之前的“赋个空值”操作所致
In [38]:

dfus+dfus2
Out[38]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-11-20 NaN 370.38 74.16
2013-11-21 NaN NaN NaN
2013-11-22 NaN 362.60 75.14
2013-11-25 NaN NaN NaN
2013-11-26 NaN 354.62 74.70
2013-11-27 NaN NaN NaN
2013-11-29 NaN 359.36 76.26
2013-12-02 NaN NaN NaN
2013-12-03 NaN NaN NaN
2013-12-04 NaN NaN NaN
2013-12-05 NaN NaN NaN
2013-12-06 NaN NaN NaN
2013-12-09 NaN NaN NaN
2013-12-10 NaN NaN NaN
2013-12-11 NaN NaN NaN
2013-12-12 NaN NaN NaN
2013-12-13 NaN 345.60 73.38
2013-12-16 NaN NaN NaN
2013-12-17 NaN 351.52 73.04
2013-12-18 NaN NaN NaN
2013-12-19 NaN 360.44 72.50
2013-12-20 NaN NaN NaN
2013-12-23 NaN NaN NaN
2013-12-24 NaN NaN NaN
2013-12-26 NaN NaN NaN
2013-12-27 NaN NaN NaN
2013-12-30 NaN 372.82 74.58
2013-12-31 NaN NaN NaN
2014-01-02 NaN 371.06 74.32
2014-01-03 NaN NaN NaN
两个dataframe取交集能产生出怎样的火花?见下(内连接)
In [39]:

df_inner_join = dfus.align(dfus2,join = ‘inner’)
df_inner_join
Out[39]:
(             IBM.US  MSFT.US
MDEntryDate
2013-11-20   185.19    37.08
2013-11-22   181.30    37.57
2013-11-26   177.31    37.35
2013-11-29   179.68    38.13
2013-12-03      NaN      NaN
2013-12-05      NaN      NaN
2013-12-09      NaN      NaN
2013-12-11      NaN      NaN
2013-12-13   172.80    36.69
2013-12-17   175.76    36.52
2013-12-19   180.22    36.25
2013-12-23      NaN      NaN
2013-12-26      NaN      NaN
2013-12-30   186.41    37.29
2014-01-02   185.53    37.16,              IBM.US  MSFT.US
MDEntryDate
2013-11-20   185.19    37.08
2013-11-22   181.30    37.57
2013-11-26   177.31    37.35
2013-11-29   179.68    38.13
2013-12-03      NaN      NaN
2013-12-05      NaN      NaN
2013-12-09      NaN      NaN
2013-12-11      NaN      NaN
2013-12-13   172.80    36.69
2013-12-17   175.76    36.52
2013-12-19   180.22    36.25
2013-12-23      NaN      NaN
2013-12-26      NaN      NaN
2013-12-30   186.41    37.29
2014-01-02   185.53    37.16)
其实结果是两个,因为表头是IBN.US,MSFT.US隔行是MDEntryDate
In [40]:

df = pd.DataFrame(np.random.randn(6, 4), index=list(‘abcdef’),
columns=[‘one’, ‘two’, ‘three’, ‘four’])
df
Out[40]:
one two three four
a -0.305934 0.283281 -0.455574 -0.776549
b -0.040167 -0.029762 0.579123 -1.684730
c -1.543881 0.144769 0.237247 2.243017
d -0.181026 0.012534 0.532293 0.047799
e -2.255002 -0.091820 0.407700 -1.238651
f -0.178221 -0.803238 0.079830 0.221831
In [41]:

to_join = pd.DataFrame(np.random.randn(2, 2), index=[‘bar’, ‘foo’],
columns=[‘j1′, ‘j2′])
to_join
Out[41]:
j1 j2
bar -0.110411 0.494628
foo 0.189955 -2.309464
按照’key’列连接
In [42]:

df[‘key’] = [‘foo’, ‘bar’] * 3
df
Out[42]:
one two three four key
a -0.305934 0.283281 -0.455574 -0.776549 foo
b -0.040167 -0.029762 0.579123 -1.684730 bar
c -1.543881 0.144769 0.237247 2.243017 foo
d -0.181026 0.012534 0.532293 0.047799 bar
e -2.255002 -0.091820 0.407700 -1.238651 foo
f -0.178221 -0.803238 0.079830 0.221831 bar
按’key’连接
In [43]:

df.join(to_join, on=’key’)
Out[43]:
one two three four key j1 j2
a -0.305934 0.283281 -0.455574 -0.776549 foo 0.189955 -2.309464
b -0.040167 -0.029762 0.579123 -1.684730 bar -0.110411 0.494628
c -1.543881 0.144769 0.237247 2.243017 foo 0.189955 -2.309464
d -0.181026 0.012534 0.532293 0.047799 bar -0.110411 0.494628
e -2.255002 -0.091820 0.407700 -1.238651 foo 0.189955 -2.309464
f -0.178221 -0.803238 0.079830 0.221831 bar -0.110411 0.494628
转置
In [44]:

df[:5].T
Out[44]:
a b c d e
one -0.3059336 -0.04016713 -1.543881 -0.1810263 -2.255002
two 0.2832805 -0.02976236 0.1447687 0.01253364 -0.09182049
three -0.4555738 0.5791231 0.2372471 0.532293 0.4076999
four -0.7765494 -1.68473 2.243017 0.04779907 -1.238651
key foo bar foo bar foo
一些细节上精彩的操作,比如下格的第三行,快速地隔行轮流输出
值得注意的是:
1、python3比python2多了一个/,原来的n/2变成了n//2
2、ricequant python notebook上要求声明调用的library一定要写明,严谨的习惯于己于服务器都是好事~
In [45]:

n=10
foo = DataFrame(index =range(n))
foo[‘strings’] = [‘foo’, ‘bar’] * (n // 2)
foo[‘floats’] = np.random.randn(n)
foo[‘ints’] = np.arange(n)
foo[‘bools’] = foo[‘floats’] > 0
foo[‘objects’] = pd.date_range(‘1/1/2000′, periods=n)
foo
Out[45]:
strings floats ints bools objects
0 foo -0.254502 0 False 2000-01-01
1 bar 0.221016 1 True 2000-01-02
2 foo 1.243960 2 True 2000-01-03
3 bar -0.362677 3 False 2000-01-04
4 foo -2.773175 4 False 2000-01-05
5 bar 1.941168 5 True 2000-01-06
6 foo 0.093740 6 True 2000-01-07
7 bar -1.240576 7 False 2000-01-08
8 foo 1.331110 8 True 2000-01-09
9 bar -0.424049 9 False 2000-01-10
查看类型
In [46]:

foo.dtypes
Out[46]:
strings            object
floats            float64
ints                int64
bools                bool
objects    datetime64[ns]
dtype: object
In [47]:

foo.T.T
Out[47]:
strings floats ints bools objects
0 foo -0.2545016 0 False 2000-01-01 00:00:00
1 bar 0.2210162 1 True 2000-01-02 00:00:00
2 foo 1.24396 2 True 2000-01-03 00:00:00
3 bar -0.3626772 3 False 2000-01-04 00:00:00
4 foo -2.773175 4 False 2000-01-05 00:00:00
5 bar 1.941168 5 True 2000-01-06 00:00:00
6 foo 0.09373966 6 True 2000-01-07 00:00:00
7 bar -1.240576 7 False 2000-01-08 00:00:00
8 foo 1.33111 8 True 2000-01-09 00:00:00
9 bar -0.4240493 9 False 2000-01-10 00:00:00
双重转置,看上去等于什么都没做,但是列的属性全部都被改成了object 数据规整时应该小心这样的细节
In [48]:

foo.T.T.dtypes
Out[48]:
strings    object
floats     object
ints       object
bools      object
objects    object
dtype: object
处理不同的时间频率,timeseries在此简称为ts,ts为之前留的dfus中的AAPL.US列
In [49]:

ts = dfus[‘AAPL.US’]
ts
Out[49]:
MDEntryDate
2013-11-20    73.571429
2013-11-21    74.448000
2013-11-22    74.257143
2013-11-25    74.820000
2013-11-26    76.200000
2013-11-27    77.994286
2013-11-29    79.438571
2013-12-02    78.747143
2013-12-03    80.903143
2013-12-04    80.714286
2013-12-05    81.128714
2013-12-06    80.002857
2013-12-09    80.918571
2013-12-10    80.792857
2013-12-11    80.194286
2013-12-12    80.077143
2013-12-13    79.204286
2013-12-16    79.642857
2013-12-17    79.284286
2013-12-18    78.681429
2013-12-19    77.780000
2013-12-20    78.431429
2013-12-23    81.441429
2013-12-24    81.095714
2013-12-26    80.557143
2013-12-27    80.012857
2013-12-30    79.217143
2013-12-31    80.145714
2014-01-02    79.018571
2014-01-03    77.282857
Name: AAPL.US, dtype: float64
In [50]:

ts.index
Out[50]:
DatetimeIndex([‘2013-11-20′, ‘2013-11-21′, ‘2013-11-22′, ‘2013-11-25′,
‘2013-11-26′, ‘2013-11-27′, ‘2013-11-29′, ‘2013-12-02′,
‘2013-12-03′, ‘2013-12-04′, ‘2013-12-05′, ‘2013-12-06′,
‘2013-12-09′, ‘2013-12-10′, ‘2013-12-11′, ‘2013-12-12′,
‘2013-12-13′, ‘2013-12-16′, ‘2013-12-17′, ‘2013-12-18′,
‘2013-12-19′, ‘2013-12-20′, ‘2013-12-23′, ‘2013-12-24′,
‘2013-12-26′, ‘2013-12-27′, ‘2013-12-30′, ‘2013-12-31′,
‘2014-01-02′, ‘2014-01-03′],
dtype=’datetime64[ns]’, name=’MDEntryDate’, freq=None, tz=None)
对DataFrame也有用
以步长5取值
In [51]:

dfus[::5]
Out[51]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-11-20 73.571429 185.19 37.08
2013-11-27 77.994286 178.97 37.60
2013-12-05 81.128714 NaN NaN
2013-12-12 80.077143 NaN NaN
2013-12-19 77.780000 180.22 36.25
2013-12-27 80.012857 NaN NaN
步长为5,向前自动填充,这个思路延伸下去,就可以制作自己定制的K线
In [52]:

dfus[::5].reindex(dfus.index, method=’ffill’)
Out[52]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2013-11-20 73.571429 185.19 37.08
2013-11-21 73.571429 185.19 37.08
2013-11-22 73.571429 185.19 37.08
2013-11-25 73.571429 185.19 37.08
2013-11-26 73.571429 185.19 37.08
2013-11-27 77.994286 178.97 37.60
2013-11-29 77.994286 178.97 37.60
2013-12-02 77.994286 178.97 37.60
2013-12-03 77.994286 178.97 37.60
2013-12-04 77.994286 178.97 37.60
2013-12-05 81.128714 NaN NaN
2013-12-06 81.128714 NaN NaN
2013-12-09 81.128714 NaN NaN
2013-12-10 81.128714 NaN NaN
2013-12-11 81.128714 NaN NaN
2013-12-12 80.077143 NaN NaN
2013-12-13 80.077143 NaN NaN
2013-12-16 80.077143 NaN NaN
2013-12-17 80.077143 NaN NaN
2013-12-18 80.077143 NaN NaN
2013-12-19 77.780000 180.22 36.25
2013-12-20 77.780000 180.22 36.25
2013-12-23 77.780000 180.22 36.25
2013-12-24 77.780000 180.22 36.25
2013-12-26 77.780000 180.22 36.25
2013-12-27 80.012857 NaN NaN
2013-12-30 80.012857 NaN NaN
2013-12-31 80.012857 NaN NaN
2014-01-02 80.012857 NaN NaN
2014-01-03 80.012857 NaN NaN
3应用:从日收益率到月收益率
重新取一遍数据,这次命名为df_us,时间默认为一年,rets为本日价格除以前一日价格减一,shift起平移作用。
In [53]:

df_us  = get_price([‘AAPL.US’,’MSFT.US’,’IBM.US’],country =’us’)[‘ClosingPx’]
rets = df_us/df_us.shift(1) – 1
rets
Out[53]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN NaN NaN
2013-01-07 -0.005882 -0.001870 -0.004382
2013-01-08 0.002691 -0.005245 -0.001398
2013-01-09 -0.015629 0.005650 -0.002852
2013-01-10 0.012396 -0.008989 0.002912
2013-01-11 -0.006132 0.013983 0.008140
2013-01-14 -0.035653 0.002236 -0.009411
2013-01-15 -0.031550 0.011900 -0.000623
2013-01-16 0.041509 -0.006248 0.000468
2013-01-17 -0.006738 0.007766 0.005504
2013-01-18 -0.005331 0.000000 0.004234
2013-01-22 0.009540 -0.003670 0.008279
2013-01-23 0.018295 0.016943 0.044064
2013-01-24 -0.123549 0.000724 -0.001465
2013-01-25 -0.023574 0.009048 0.002691
… … … …
2013-12-12 -0.001461 -0.010370 -0.010445
2013-12-13 -0.010900 -0.014240 -0.003288
2013-12-16 0.005537 0.005315 0.029225
2013-12-17 -0.004502 -0.009896 -0.011751
2013-12-18 -0.007604 0.001643 0.016727
2013-12-19 -0.011457 -0.009021 0.008506
2013-12-20 0.008375 0.015172 -0.001110
2013-12-23 0.038377 -0.004891 0.012276
2013-12-24 -0.004245 0.012561 0.005433
2013-12-26 -0.006641 0.009709 0.011625
2013-12-27 -0.006757 -0.004006 -0.001457
2013-12-30 -0.009945 0.000000 0.007186
2013-12-31 0.011722 0.003218 0.006223
2014-01-02 -0.014064 -0.006683 -0.010876
2014-01-03 -0.021966 -0.006728 0.005983
252 rows × 3 columns
强迫症肯定是要消灭NaN的,让我们用上之前用ix标签赋值的方法:
In [54]:

firstline = rets.index[0]
rets.ix[firstline]=[0,0,0]
rets
Out[54]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 0.000000 0.000000 0.000000
2013-01-07 -0.005882 -0.001870 -0.004382
2013-01-08 0.002691 -0.005245 -0.001398
2013-01-09 -0.015629 0.005650 -0.002852
2013-01-10 0.012396 -0.008989 0.002912
2013-01-11 -0.006132 0.013983 0.008140
2013-01-14 -0.035653 0.002236 -0.009411
2013-01-15 -0.031550 0.011900 -0.000623
2013-01-16 0.041509 -0.006248 0.000468
2013-01-17 -0.006738 0.007766 0.005504
2013-01-18 -0.005331 0.000000 0.004234
2013-01-22 0.009540 -0.003670 0.008279
2013-01-23 0.018295 0.016943 0.044064
2013-01-24 -0.123549 0.000724 -0.001465
2013-01-25 -0.023574 0.009048 0.002691
… … … …
2013-12-12 -0.001461 -0.010370 -0.010445
2013-12-13 -0.010900 -0.014240 -0.003288
2013-12-16 0.005537 0.005315 0.029225
2013-12-17 -0.004502 -0.009896 -0.011751
2013-12-18 -0.007604 0.001643 0.016727
2013-12-19 -0.011457 -0.009021 0.008506
2013-12-20 0.008375 0.015172 -0.001110
2013-12-23 0.038377 -0.004891 0.012276
2013-12-24 -0.004245 0.012561 0.005433
2013-12-26 -0.006641 0.009709 0.011625
2013-12-27 -0.006757 -0.004006 -0.001457
2013-12-30 -0.009945 0.000000 0.007186
2013-12-31 0.011722 0.003218 0.006223
2014-01-02 -0.014064 -0.006683 -0.010876
2014-01-03 -0.021966 -0.006728 0.005983
252 rows × 3 columns
计算累计的日收益,就是相当于价格对于基准日期的变化图了,有兴趣的同学可以自己画一遍原图看看。
In [55]:

daily_index = (1 + rets).cumprod()
daily_index[‘IBM.US’].plot()

Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd320f9e908>

求月收益
In [56]:

monthly_index = daily_index.asfreq(‘EOM’)
monthly_index
Out[56]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-31 0.864307 1.026552 1.046807
2013-02-28 0.837571 1.039641 1.035260
2013-03-29 NaN NaN NaN
2013-04-30 0.840190 1.237846 1.044074
2013-05-31 0.853387 1.305161 1.072323
2013-06-28 0.752429 1.291885 0.985154
2013-07-31 0.858691 1.190726 1.005413
2013-08-30 0.924509 1.249065 0.939585
2013-09-30 0.904649 1.244577 0.954585
2013-10-31 0.991844 1.324046 0.923811
2013-11-29 1.055161 1.425954 0.926233
2013-12-31 1.064554 1.399028 0.966906
为了图的完整性,看到有NaN直接合理填了,使用的是method = ffill
In [57]:

monthly_index = daily_index.asfreq(‘EOM’, method=’ffill’)
monthly_index
Out[57]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-31 0.864307 1.026552 1.046807
2013-02-28 0.837571 1.039641 1.035260
2013-03-29 0.839962 1.069746 1.099541
2013-04-30 0.840190 1.237846 1.044074
2013-05-31 0.853387 1.305161 1.072323
2013-06-28 0.752429 1.291885 0.985154
2013-07-31 0.858691 1.190726 1.005413
2013-08-30 0.924509 1.249065 0.939585
2013-09-30 0.904649 1.244577 0.954585
2013-10-31 0.991844 1.324046 0.923811
2013-11-29 1.055161 1.425954 0.926233
2013-12-31 1.064554 1.399028 0.966906
In [58]:

monthly_rets = monthly_index / monthly_index.shift(1) – 1
monthly_rets
Out[58]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-31 NaN NaN NaN
2013-02-28 -0.030934 0.012750 -0.011031
2013-03-29 0.002855 0.028957 0.062092
2013-04-30 0.000271 0.157140 -0.050445
2013-05-31 0.015708 0.054381 0.027056
2013-06-28 -0.118303 -0.010172 -0.081290
2013-07-31 0.141225 -0.078304 0.020564
2013-08-30 0.076649 0.048995 -0.065474
2013-09-30 -0.021481 -0.003593 0.015965
2013-10-31 0.096386 0.063852 -0.032239
2013-11-29 0.063838 0.076967 0.002623
2013-12-31 0.008902 -0.018883 0.043911
复制一个df_us来进行新的操作
In [59]:

df_us2 = df_us.copy()
df_us2.ix[[0, 7], ‘AAPL.US’] = np.nan
df_us2.ix[[2, 4], ‘IBM.US’] = np.nan
df_us2
Out[59]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN 26.740 193.99
2013-01-07 74.842857 26.690 193.14
2013-01-08 75.044286 26.550 NaN
2013-01-09 73.871429 26.700 192.32
2013-01-10 74.787143 26.460 NaN
2013-01-11 74.328571 26.830 194.45
2013-01-14 71.678571 26.890 192.62
2013-01-15 NaN 27.210 192.50
2013-01-16 72.298571 27.040 192.59
2013-01-17 71.811429 27.250 193.65
2013-01-18 71.428571 27.250 194.47
2013-01-22 72.110000 27.150 196.08
2013-01-23 73.429286 27.610 204.72
2013-01-24 64.357143 27.630 204.42
2013-01-25 62.840000 27.880 204.97
… … … …
2013-12-12 80.077143 37.220 173.37
2013-12-13 79.204286 36.690 172.80
2013-12-16 79.642857 36.885 177.85
2013-12-17 79.284286 36.520 175.76
2013-12-18 78.681429 36.580 178.70
2013-12-19 77.780000 36.250 180.22
2013-12-20 78.431429 36.800 180.02
2013-12-23 81.441429 36.620 182.23
2013-12-24 81.095714 37.080 183.22
2013-12-26 80.557143 37.440 185.35
2013-12-27 80.012857 37.290 185.08
2013-12-30 79.217143 37.290 186.41
2013-12-31 80.145714 37.410 187.57
2014-01-02 79.018571 37.160 185.53
2014-01-03 77.282857 36.910 186.64
252 rows × 3 columns
先挖NaN,后填之,发现寂寞的2013-01-04是NaN…
In [60]:

df_us2 = df_us2.fillna(method=’ffill’)
df_us2
Out[60]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN 26.740 193.99
2013-01-07 74.842857 26.690 193.14
2013-01-08 75.044286 26.550 193.14
2013-01-09 73.871429 26.700 192.32
2013-01-10 74.787143 26.460 192.32
2013-01-11 74.328571 26.830 194.45
2013-01-14 71.678571 26.890 192.62
2013-01-15 71.678571 27.210 192.50
2013-01-16 72.298571 27.040 192.59
2013-01-17 71.811429 27.250 193.65
2013-01-18 71.428571 27.250 194.47
2013-01-22 72.110000 27.150 196.08
2013-01-23 73.429286 27.610 204.72
2013-01-24 64.357143 27.630 204.42
2013-01-25 62.840000 27.880 204.97
… … … …
2013-12-12 80.077143 37.220 173.37
2013-12-13 79.204286 36.690 172.80
2013-12-16 79.642857 36.885 177.85
2013-12-17 79.284286 36.520 175.76
2013-12-18 78.681429 36.580 178.70
2013-12-19 77.780000 36.250 180.22
2013-12-20 78.431429 36.800 180.02
2013-12-23 81.441429 36.620 182.23
2013-12-24 81.095714 37.080 183.22
2013-12-26 80.557143 37.440 185.35
2013-12-27 80.012857 37.290 185.08
2013-12-30 79.217143 37.290 186.41
2013-12-31 80.145714 37.410 187.57
2014-01-02 79.018571 37.160 185.53
2014-01-03 77.282857 36.910 186.64
252 rows × 3 columns
In [61]:

df_us2[‘AAPL.US’].mean()
Out[61]:
67.499026465566942
In [62]:

df_us2.mean(0)
Out[62]:
AAPL.US     67.499026
MSFT.US     32.567698
IBM.US     194.070437
dtype: float64
In [63]:

weights = Series(np.random.randn(3), index=df_us2.columns)
weights /= weights.sum()
df_us2
Out[63]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN 26.740 193.99
2013-01-07 74.842857 26.690 193.14
2013-01-08 75.044286 26.550 193.14
2013-01-09 73.871429 26.700 192.32
2013-01-10 74.787143 26.460 192.32
2013-01-11 74.328571 26.830 194.45
2013-01-14 71.678571 26.890 192.62
2013-01-15 71.678571 27.210 192.50
2013-01-16 72.298571 27.040 192.59
2013-01-17 71.811429 27.250 193.65
2013-01-18 71.428571 27.250 194.47
2013-01-22 72.110000 27.150 196.08
2013-01-23 73.429286 27.610 204.72
2013-01-24 64.357143 27.630 204.42
2013-01-25 62.840000 27.880 204.97
… … … …
2013-12-12 80.077143 37.220 173.37
2013-12-13 79.204286 36.690 172.80
2013-12-16 79.642857 36.885 177.85
2013-12-17 79.284286 36.520 175.76
2013-12-18 78.681429 36.580 178.70
2013-12-19 77.780000 36.250 180.22
2013-12-20 78.431429 36.800 180.02
2013-12-23 81.441429 36.620 182.23
2013-12-24 81.095714 37.080 183.22
2013-12-26 80.557143 37.440 185.35
2013-12-27 80.012857 37.290 185.08
2013-12-30 79.217143 37.290 186.41
2013-12-31 80.145714 37.410 187.57
2014-01-02 79.018571 37.160 185.53
2014-01-03 77.282857 36.910 186.64
252 rows × 3 columns
去掉第四行(20130109)的基准值
In [64]:

df_us2 – df_us2.ix[3]
Out[64]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN 0.040 1.67
2013-01-07 0.971429 -0.010 0.82
2013-01-08 1.172857 -0.150 0.82
2013-01-09 0.000000 0.000 0.00
2013-01-10 0.915714 -0.240 0.00
2013-01-11 0.457143 0.130 2.13
2013-01-14 -2.192857 0.190 0.30
2013-01-15 -2.192857 0.510 0.18
2013-01-16 -1.572857 0.340 0.27
2013-01-17 -2.060000 0.550 1.33
2013-01-18 -2.442857 0.550 2.15
2013-01-22 -1.761429 0.450 3.76
2013-01-23 -0.442143 0.910 12.40
2013-01-24 -9.514286 0.930 12.10
2013-01-25 -11.031429 1.180 12.65
… … … …
2013-12-12 6.205714 10.520 -18.95
2013-12-13 5.332857 9.990 -19.52
2013-12-16 5.771429 10.185 -14.47
2013-12-17 5.412857 9.820 -16.56
2013-12-18 4.810000 9.880 -13.62
2013-12-19 3.908571 9.550 -12.10
2013-12-20 4.560000 10.100 -12.30
2013-12-23 7.570000 9.920 -10.09
2013-12-24 7.224286 10.380 -9.10
2013-12-26 6.685714 10.740 -6.97
2013-12-27 6.141429 10.590 -7.24
2013-12-30 5.345714 10.590 -5.91
2013-12-31 6.274286 10.710 -4.75
2014-01-02 5.147143 10.460 -6.79
2014-01-03 3.411429 10.210 -5.68
252 rows × 3 columns
df_us2减去本身的平均值
In [65]:

df_us2 – df_us2.mean(0)
Out[65]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN -5.827698 -0.080437
2013-01-07 7.343831 -5.877698 -0.930437
2013-01-08 7.545259 -6.017698 -0.930437
2013-01-09 6.372402 -5.867698 -1.750437
2013-01-10 7.288116 -6.107698 -1.750437
2013-01-11 6.829545 -5.737698 0.379563
2013-01-14 4.179545 -5.677698 -1.450437
2013-01-15 4.179545 -5.357698 -1.570437
2013-01-16 4.799545 -5.527698 -1.480437
2013-01-17 4.312402 -5.317698 -0.420437
2013-01-18 3.929545 -5.317698 0.399563
2013-01-22 4.610974 -5.417698 2.009563
2013-01-23 5.930259 -4.957698 10.649563
2013-01-24 -3.141884 -4.937698 10.349563
2013-01-25 -4.659026 -4.687698 10.899563
… … … …
2013-12-12 12.578116 4.652302 -20.700437
2013-12-13 11.705259 4.122302 -21.270437
2013-12-16 12.143831 4.317302 -16.220437
2013-12-17 11.785259 3.952302 -18.310437
2013-12-18 11.182402 4.012302 -15.370437
2013-12-19 10.280974 3.682302 -13.850437
2013-12-20 10.932402 4.232302 -14.050437
2013-12-23 13.942402 4.052302 -11.840437
2013-12-24 13.596688 4.512302 -10.850437
2013-12-26 13.058116 4.872302 -8.720437
2013-12-27 12.513831 4.722302 -8.990437
2013-12-30 11.718116 4.722302 -7.660437
2013-12-31 12.646688 4.842302 -6.500437
2014-01-02 11.519545 4.592302 -8.540437
2014-01-03 9.783831 4.342302 -7.430437
252 rows × 3 columns
扣除每日有价格股票的平均值
In [66]:

(df_us2.T – df_us2.mean(1)).T
Out[66]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN -83.625000 83.625000
2013-01-07 -23.381429 -71.534286 94.915714
2013-01-08 -23.200476 -71.694762 94.895238
2013-01-09 -23.759048 -70.930476 94.689524
2013-01-10 -23.068571 -71.395714 94.464286
2013-01-11 -24.207619 -71.706190 95.913810
2013-01-14 -25.384286 -70.172857 95.557143
2013-01-15 -25.450952 -69.919524 95.370476
2013-01-16 -25.010952 -70.269524 95.280476
2013-01-17 -25.759048 -70.320476 96.079524
2013-01-18 -26.287619 -70.466190 96.753810
2013-01-22 -26.336667 -71.296667 97.633333
2013-01-23 -28.490476 -74.309762 102.800238
2013-01-24 -34.445238 -71.172381 105.617619
2013-01-25 -35.723333 -70.683333 106.406667
… … … …
2013-12-12 -16.811905 -59.669048 76.480952
2013-12-13 -17.027143 -59.541429 76.568571
2013-12-16 -18.483095 -61.240952 79.724048
2013-12-17 -17.903810 -60.668095 78.571905
2013-12-18 -19.305714 -61.407143 80.712857
2013-12-19 -20.303333 -61.833333 82.136667
2013-12-20 -19.985714 -61.617143 81.602857
2013-12-23 -18.655714 -63.477143 82.132857
2013-12-24 -19.369524 -63.385238 82.754762
2013-12-26 -20.558571 -63.675714 84.234286
2013-12-27 -20.781429 -63.504286 84.285714
2013-12-30 -21.755238 -63.682381 85.437619
2013-12-31 -21.562857 -64.298571 85.861429
2014-01-02 -21.550952 -63.409524 84.960476
2014-01-03 -22.994762 -63.367619 86.362381
252 rows × 3 columns
标准化三股之间的标准差
In [67]:

std_xs = (df_us2 – df_us2.mean(1)) / df_us2.std(1)
std_xs.mean(1)
/srv/env/lib64/python3.4/site-packages/pandas/core/frame.py:3200: FutureWarning: TimeSeries broadcasting along DataFrame index by default is deprecated. Please use DataFrame.<op> to explicitly broadcast arithmetic operations along the index
FutureWarning)
Out[67]:
MDEntryDate
2013-01-04   -5.551115e-17
2013-01-07    0.000000e+00
2013-01-08    0.000000e+00
2013-01-09    7.401487e-17
2013-01-10    7.401487e-17
2013-01-11    7.401487e-17
2013-01-14    7.401487e-17
2013-01-15    0.000000e+00
2013-01-16    0.000000e+00
2013-01-17    7.401487e-17
2013-01-18    7.401487e-17
2013-01-22   -7.401487e-17
2013-01-23   -7.401487e-17
2013-01-24    0.000000e+00
2013-01-25    0.000000e+00

2013-12-12    2.220446e-16
2013-12-13   -7.401487e-17
2013-12-16   -7.401487e-17
2013-12-17   -7.401487e-17
2013-12-18   -7.401487e-17
2013-12-19    7.401487e-17
2013-12-20   -7.401487e-17
2013-12-23   -7.401487e-17
2013-12-24   -2.220446e-16
2013-12-26    7.401487e-17
2013-12-27    2.220446e-16
2013-12-30    7.401487e-17
2013-12-31    7.401487e-17
2014-01-02   -7.401487e-17
2014-01-03    7.401487e-17
dtype: float64
以上的代码感觉是被python3鄙视了,depreciated的中文翻译是轻视,说明你的代码可以,但是逼格不够。 怒使用Lambda表达式,当然,也不得不承认,这也是python规范化的一个表现。
In [68]:

a = df_us2.mean(1)
b = df_us2.std(1)
df_us.apply(lambda x: (x-x.mean())/x.std())
Out[68]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 1.210289 -1.689979 -0.007282
2013-01-07 1.141261 -1.704479 -0.083142
2013-01-08 1.172658 -1.745077 -0.107239
2013-01-09 0.989845 -1.701579 -0.156326
2013-01-10 1.132577 -1.771177 -0.106347
2013-01-11 1.061100 -1.663880 0.033773
2013-01-14 0.648046 -1.646480 -0.129551
2013-01-15 0.295558 -1.553683 -0.140261
2013-01-16 0.744685 -1.602982 -0.132229
2013-01-17 0.668754 -1.542084 -0.037626
2013-01-18 0.609078 -1.542084 0.035558
2013-01-22 0.715292 -1.571083 0.179247
2013-01-23 0.920928 -1.437687 0.950350
2013-01-24 -0.493141 -1.431887 0.923576
2013-01-25 -0.729617 -1.359390 0.972662
… … … …
2013-12-12 1.957126 1.349125 -1.847577
2013-12-13 1.821074 1.195430 -1.898448
2013-12-16 1.889434 1.251978 -1.447745
2013-12-17 1.833544 1.146131 -1.634274
2013-12-18 1.739577 1.163531 -1.371884
2013-12-19 1.599072 1.067834 -1.236227
2013-12-20 1.700610 1.227329 -1.254077
2013-12-23 2.169776 1.175130 -1.056839
2013-12-24 2.115890 1.308526 -0.968483
2013-12-26 2.031943 1.412923 -0.778385
2013-12-27 1.947106 1.369424 -0.802482
2013-12-30 1.823078 1.369424 -0.683782
2013-12-31 1.967814 1.404223 -0.580254
2014-01-02 1.792127 1.331725 -0.762320
2014-01-03 1.521582 1.259228 -0.663255
252 rows × 3 columns
苹果第一列的NaN在计数时终于有所体现
In [69]:

std_xs.count()
Out[69]:
AAPL.US    251
MSFT.US    252
IBM.US     252
dtype: int64
In [70]:

df_us.apply(np.mean)
Out[70]:
AAPL.US     67.520952
MSFT.US     32.567698
IBM.US     194.071587
dtype: float64
In [71]:

df_us.apply(np.mean, axis=1)
Out[71]:
MDEntryDate
2013-01-04     98.671905
2013-01-07     98.224286
2013-01-08     98.154762
2013-01-09     97.630476
2013-01-10     98.042381
2013-01-11     98.536190
2013-01-14     97.062857
2013-01-15     96.375714
2013-01-16     97.309524
2013-01-17     97.570476
2013-01-18     97.716190
2013-01-22     98.446667
2013-01-23    101.919762
2013-01-24     98.802381
2013-01-25     98.563333

2013-12-12     96.889048
2013-12-13     96.231429
2013-12-16     98.125952
2013-12-17     97.188095
2013-12-18     97.987143
2013-12-19     98.083333
2013-12-20     98.417143
2013-12-23    100.097143
2013-12-24    100.465238
2013-12-26    101.115714
2013-12-27    100.794286
2013-12-30    100.972381
2013-12-31    101.708571
2014-01-02    100.569524
2014-01-03    100.277619
dtype: float64
lambda式:最大值减最小值
In [72]:

df_us.apply(lambda x: x.max() – x.min())
Out[72]:
AAPL.US    25.651429
MSFT.US    12.480000
IBM.US     43.000000
dtype: float64
4应用NumPy Array函数
尽管Series是Numpy ndarray 而DataFrame不是,但是大多数的函数依然有效。 注意:回报是一阶近似的。
将回报对数化
In [73]:

np.log(df_us / df_us.shift(1))
Out[73]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 NaN NaN NaN
2013-01-07 -0.005900 -0.001872 -0.004391
2013-01-08 0.002688 -0.005259 -0.001399
2013-01-09 -0.015752 0.005634 -0.002856
2013-01-10 0.012320 -0.009029 0.002908
2013-01-11 -0.006151 0.013887 0.008107
2013-01-14 -0.036304 0.002234 -0.009456
2013-01-15 -0.032058 0.011830 -0.000623
2013-01-16 0.040671 -0.006267 0.000467
2013-01-17 -0.006761 0.007736 0.005489
2013-01-18 -0.005346 0.000000 0.004226
2013-01-22 0.009495 -0.003676 0.008245
2013-01-23 0.018130 0.016801 0.043120
2013-01-24 -0.131875 0.000724 -0.001466
2013-01-25 -0.023856 0.009007 0.002687
… … … …
2013-12-12 -0.001462 -0.010424 -0.010500
2013-12-13 -0.010960 -0.014342 -0.003293
2013-12-16 0.005522 0.005301 0.028806
2013-12-17 -0.004512 -0.009945 -0.011821
2013-12-18 -0.007633 0.001642 0.016589
2013-12-19 -0.011523 -0.009062 0.008470
2013-12-20 0.008340 0.015058 -0.001110
2013-12-23 0.037659 -0.004903 0.012202
2013-12-24 -0.004254 0.012483 0.005418
2013-12-26 -0.006663 0.009662 0.011558
2013-12-27 -0.006779 -0.004014 -0.001458
2013-12-30 -0.009995 0.000000 0.007160
2013-12-31 0.011654 0.003213 0.006204
2014-01-02 -0.014164 -0.006705 -0.010936
2014-01-03 -0.022211 -0.006750 0.005965
252 rows × 3 columns
指数化
In [74]:

np.exp(rets[‘AAPL.US’])
Out[74]:
MDEntryDate
2013-01-04    1.000000
2013-01-07    0.994135
2013-01-08    1.002695
2013-01-09    0.984493
2013-01-10    1.012473
2013-01-11    0.993887
2013-01-14    0.964976
2013-01-15    0.968943
2013-01-16    1.042382
2013-01-17    0.993285
2013-01-18    0.994683
2013-01-22    1.009586
2013-01-23    1.018464
2013-01-24    0.883778
2013-01-25    0.976702

2013-12-12    0.998540
2013-12-13    0.989159
2013-12-16    1.005553
2013-12-17    0.995508
2013-12-18    0.992425
2013-12-19    0.988609
2013-12-20    1.008410
2013-12-23    1.039123
2013-12-24    0.995764
2013-12-26    0.993381
2013-12-27    0.993266
2013-12-30    0.990104
2013-12-31    1.011791
2014-01-02    0.986035
2014-01-03    0.978274
Name: AAPL.US, dtype: float64
5 生成日期范围和日期偏移量
In [75]:

offset = pd.datetools.BDay()
rng = pd.date_range(‘1/1/2000′, ‘1/20/2000′)
rng
Out[75]:
DatetimeIndex([‘2000-01-01′, ‘2000-01-02′, ‘2000-01-03′, ‘2000-01-04′,
‘2000-01-05′, ‘2000-01-06′, ‘2000-01-07′, ‘2000-01-08′,
‘2000-01-09′, ‘2000-01-10′, ‘2000-01-11′, ‘2000-01-12′,
‘2000-01-13′, ‘2000-01-14′, ‘2000-01-15′, ‘2000-01-16′,
‘2000-01-17′, ‘2000-01-18′, ‘2000-01-19′, ‘2000-01-20′],
dtype=’datetime64[ns]’, freq=’D’, tz=None)
日期偏移量是定制的,上一格中日期偏移量为默认的一个工作日,而之后我们可以对其做修改
In [76]:

offset = pd.datetools.BDay(3)
offset
Out[76]:
<3 * BusinessDays>
生成日期标签
In [77]:

pd.date_range(‘1/1/2000′, ‘1/20/2000′)
Out[77]:
DatetimeIndex([‘2000-01-01′, ‘2000-01-02′, ‘2000-01-03′, ‘2000-01-04′,
‘2000-01-05′, ‘2000-01-06′, ‘2000-01-07′, ‘2000-01-08′,
‘2000-01-09′, ‘2000-01-10′, ‘2000-01-11′, ‘2000-01-12′,
‘2000-01-13′, ‘2000-01-14′, ‘2000-01-15′, ‘2000-01-16′,
‘2000-01-17′, ‘2000-01-18′, ‘2000-01-19′, ‘2000-01-20′],
dtype=’datetime64[ns]’, freq=’D’, tz=None)
In [78]:

minutes5 = pd.datetools.Minute(5)
pd.date_range(‘1/1/2000 09:30:00′, ‘1/1/2000 10:00:00′,freq=’5min’)
Out[78]:
DatetimeIndex([‘2000-01-01 09:30:00′, ‘2000-01-01 09:35:00′,
‘2000-01-01 09:40:00′, ‘2000-01-01 09:45:00′,
‘2000-01-01 09:50:00′, ‘2000-01-01 09:55:00′,
‘2000-01-01 10:00:00′],
dtype=’datetime64[ns]’, freq=’5T’, tz=None)
DateRange生成日期是一个有效的标签
In [79]:

rng = pd.date_range(‘1/1/2000′, ‘1/20/2000′)
df = pd.DataFrame(np.random.randn(len(rng), 4), index=rng,
columns=[‘A’, ‘B’, ‘C’, ‘D’])
df
Out[79]:
A B C D
2000-01-01 1.795201 -1.250029 1.053080 -0.560032
2000-01-02 1.450612 -1.241729 0.712836 -1.501445
2000-01-03 -0.304211 0.193008 0.871547 -1.962467
2000-01-04 0.141303 1.276529 0.716661 -1.232438
2000-01-05 0.022977 0.728106 0.262800 -1.159754
2000-01-06 -1.071608 -1.113805 0.724509 -0.957385
2000-01-07 0.580717 0.024511 2.299809 -0.929498
2000-01-08 -0.045120 2.026626 -1.175054 -0.575247
2000-01-09 -0.513258 0.044271 -0.397681 0.800668
2000-01-10 0.009016 0.193253 0.702361 0.147106
2000-01-11 -0.279840 -1.409914 0.475579 -1.180144
2000-01-12 0.134656 0.343681 0.326087 -0.359339
2000-01-13 0.734198 1.575572 -1.093865 -0.881270
2000-01-14 -0.082568 1.313544 0.038114 0.396534
2000-01-15 1.014821 0.147995 -1.684033 -0.991052
2000-01-16 -1.280302 -0.064478 -0.751844 1.113849
2000-01-17 0.533392 -1.330133 0.548605 -0.426391
2000-01-18 0.357001 0.658065 -1.166845 1.312358
2000-01-19 -0.431234 0.023455 -1.152396 -1.884213
2000-01-20 -0.398013 0.134987 0.110638 -0.038108
In [80]:

df.index
Out[80]:
DatetimeIndex([‘2000-01-01′, ‘2000-01-02′, ‘2000-01-03′, ‘2000-01-04′,
‘2000-01-05′, ‘2000-01-06′, ‘2000-01-07′, ‘2000-01-08′,
‘2000-01-09′, ‘2000-01-10′, ‘2000-01-11′, ‘2000-01-12′,
‘2000-01-13′, ‘2000-01-14′, ‘2000-01-15′, ‘2000-01-16′,
‘2000-01-17′, ‘2000-01-18′, ‘2000-01-19′, ‘2000-01-20′],
dtype=’datetime64[ns]’, freq=’D’, tz=None)
时间频率操作
Frequency conversions
In [81]:

rets.asfreq(pd.datetools.BMonthEnd()
)[:20]
Out[81]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-31 -0.002933 -0.014363 -0.002211
2013-02-28 -0.007130 -0.000360 -0.007414
2013-03-29 NaN NaN NaN
2013-04-30 0.029434 0.015026 0.017022
2013-05-31 -0.004086 -0.003711 -0.006400
2013-06-28 0.006984 -0.002166 -0.023205
2013-07-31 -0.001743 -0.000314 -0.004949
2013-08-30 -0.009119 -0.004471 -0.002026
2013-09-30 -0.012429 0.000301 -0.009309
2013-10-31 -0.004180 -0.003799 -0.005218
2013-11-29 0.018518 0.014096 0.003967
2013-12-31 0.011722 0.003218 0.006223
In [82]:

rng = pd.date_range(ts.index[0], ts.index[-1],freq=’M’)
ts.reindex(rng)[:20]
Out[82]:
2013-11-30          NaN
2013-12-31    80.145714
Freq: M, Name: AAPL.US, dtype: float64
6 重采样/降采样/升采样
In [83]:

monthly_px = df_us[-30:].asfreq(‘W@WED’)
monthly_px
Out[83]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-11-20 73.571429 37.08 185.19
2013-11-27 77.994286 37.60 178.97
2013-12-04 80.714286 38.94 175.74
2013-12-11 80.194286 37.61 175.20
2013-12-18 78.681429 36.58 178.70
2013-12-25 NaN NaN NaN
2014-01-01 NaN NaN NaN
In [84]:

monthly_px.asfreq(pd.datetools.bday, method=’ffill’)
Out[84]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-11-20 73.571429 37.08 185.19
2013-11-21 73.571429 37.08 185.19
2013-11-22 73.571429 37.08 185.19
2013-11-25 73.571429 37.08 185.19
2013-11-26 73.571429 37.08 185.19
2013-11-27 77.994286 37.60 178.97
2013-11-28 77.994286 37.60 178.97
2013-11-29 77.994286 37.60 178.97
2013-12-02 77.994286 37.60 178.97
2013-12-03 77.994286 37.60 178.97
2013-12-04 80.714286 38.94 175.74
2013-12-05 80.714286 38.94 175.74
2013-12-06 80.714286 38.94 175.74
2013-12-09 80.714286 38.94 175.74
2013-12-10 80.714286 38.94 175.74
2013-12-11 80.194286 37.61 175.20
2013-12-12 80.194286 37.61 175.20
2013-12-13 80.194286 37.61 175.20
2013-12-16 80.194286 37.61 175.20
2013-12-17 80.194286 37.61 175.20
2013-12-18 78.681429 36.58 178.70
2013-12-19 78.681429 36.58 178.70
2013-12-20 78.681429 36.58 178.70
2013-12-23 78.681429 36.58 178.70
2013-12-24 78.681429 36.58 178.70
2013-12-25 NaN NaN NaN
2013-12-26 NaN NaN NaN
2013-12-27 NaN NaN NaN
2013-12-30 NaN NaN NaN
2013-12-31 NaN NaN NaN
2014-01-01 NaN NaN NaN
In [85]:

rets
Out[85]:
AAPL.US MSFT.US IBM.US
MDEntryDate
2013-01-04 0.000000 0.000000 0.000000
2013-01-07 -0.005882 -0.001870 -0.004382
2013-01-08 0.002691 -0.005245 -0.001398
2013-01-09 -0.015629 0.005650 -0.002852
2013-01-10 0.012396 -0.008989 0.002912
2013-01-11 -0.006132 0.013983 0.008140
2013-01-14 -0.035653 0.002236 -0.009411
2013-01-15 -0.031550 0.011900 -0.000623
2013-01-16 0.041509 -0.006248 0.000468
2013-01-17 -0.006738 0.007766 0.005504
2013-01-18 -0.005331 0.000000 0.004234
2013-01-22 0.009540 -0.003670 0.008279
2013-01-23 0.018295 0.016943 0.044064
2013-01-24 -0.123549 0.000724 -0.001465
2013-01-25 -0.023574 0.009048 0.002691
… … … …
2013-12-12 -0.001461 -0.010370 -0.010445
2013-12-13 -0.010900 -0.014240 -0.003288
2013-12-16 0.005537 0.005315 0.029225
2013-12-17 -0.004502 -0.009896 -0.011751
2013-12-18 -0.007604 0.001643 0.016727
2013-12-19 -0.011457 -0.009021 0.008506
2013-12-20 0.008375 0.015172 -0.001110
2013-12-23 0.038377 -0.004891 0.012276
2013-12-24 -0.004245 0.012561 0.005433
2013-12-26 -0.006641 0.009709 0.011625
2013-12-27 -0.006757 -0.004006 -0.001457
2013-12-30 -0.009945 0.000000 0.007186
2013-12-31 0.011722 0.003218 0.006223
2014-01-02 -0.014064 -0.006683 -0.010876
2014-01-03 -0.021966 -0.006728 0.005983
252 rows × 3 columns
对总样本做降采样
In [86]:

offset = pd.datetools.BMonthEnd()
average_monthly_px = rets.groupby(offset.rollforward).mean()
average_monthly_px[-20:]
Out[86]:
AAPL.US MSFT.US IBM.US
2013-01-31 -0.007068 0.001411 0.002467
2013-02-28 -0.001523 0.000711 -0.000546
2013-03-29 0.000284 0.001442 0.003046
2013-04-30 0.000229 0.006835 -0.002128
2013-05-31 0.000806 0.002468 0.001248
2013-06-28 -0.006231 -0.000416 -0.004170
2013-07-31 0.006137 -0.003329 0.000963
2013-08-30 0.003482 0.002378 -0.003053
2013-09-30 -0.000844 -0.000056 0.000839
2013-10-31 0.004074 0.002810 -0.001272
2013-11-29 0.003161 0.003810 0.000164
2013-12-31 0.000504 -0.000846 0.002101
2014-01-31 -0.018015 -0.006705 -0.002447
7 层次化标签
非常好用的工具之一,在二维上突破简单图表的一大利器
In [87]:

index = pd.MultiIndex(levels=[[‘foo’, ‘bar’, ‘baz’, ‘qux’],
[‘one’, ‘two’, ‘three’]],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]])
hdf = pd.DataFrame(np.random.randn(10, 3), index=index,
columns=[‘A’, ‘B’, ‘C’])
hdf
Out[87]:
A B C
foo one -1.047611 0.604945 -0.068992
two -0.105756 0.437211 -2.394942
three -0.928894 2.240227 0.866613
bar one -0.634099 -0.731033 -0.848086
two 0.832025 0.834709 0.477227
baz two 1.324745 0.404173 1.120245
three -1.269327 -0.242217 -1.323213
qux one -1.661129 -1.098487 -0.431622
two 0.286669 -0.574178 -0.341468
three -0.598695 1.031828 0.437277
In [88]:

hdf.ix[‘foo’]
Out[88]:
A B C
one -1.047611 0.604945 -0.068992
two -0.105756 0.437211 -2.394942
three -0.928894 2.240227 0.866613
In [89]:

hdf.ix[‘foo’] = 0
hdf
Out[89]:
A B C
foo one 0.000000 0.000000 0.000000
two 0.000000 0.000000 0.000000
three 0.000000 0.000000 0.000000
bar one -0.634099 -0.731033 -0.848086
two 0.832025 0.834709 0.477227
baz two 1.324745 0.404173 1.120245
three -1.269327 -0.242217 -1.323213
qux one -1.661129 -1.098487 -0.431622
two 0.286669 -0.574178 -0.341468
three -0.598695 1.031828 0.437277
In [90]:

hdf.ix[‘foo’, ‘three’]
Out[90]:
A    0
B    0
C    0
Name: (foo, three), dtype: float64
8 转换重塑
下格定义了一个函数,函数做了连续重复三次又循环了4次。
In [91]:

import pandas.util.testing as tm; tm.N = 3
def unpivot(frame):
N, K = frame.shape
data = {‘value’ : frame.values.ravel(‘F’),
‘variable’ : np.asarray(frame.columns).repeat(N),
‘date’ : np.tile(np.asarray(frame.index), K)}
return DataFrame(data, columns=[‘date’, ‘variable’, ‘value’])
df = unpivot(tm.makeTimeDataFrame())
df
Out[91]:
date variable value
0 2000-01-03 A 0.728665
1 2000-01-04 A -0.582155
2 2000-01-05 A 0.190112
3 2000-01-03 B 1.111317
4 2000-01-04 B -0.393556
5 2000-01-05 B -0.883954
6 2000-01-03 C -1.164946
7 2000-01-04 C 0.381472
8 2000-01-05 C -0.501643
9 2000-01-03 D 0.305640
10 2000-01-04 D -0.499032
11 2000-01-05 D 0.421693
In [92]:

df.pivot(‘date’, ‘variable’)
Out[92]:
value
variable A B C D
date
2000-01-03 0.728665 1.111317 -1.164946 0.305640
2000-01-04 -0.582155 -0.393556 0.381472 -0.499032
2000-01-05 0.190112 -0.883954 -0.501643 0.421693
In [93]:

df[‘value2′] = df[‘value’] * 2
df
Out[93]:
date variable value value2
0 2000-01-03 A 0.728665 1.457330
1 2000-01-04 A -0.582155 -1.164310
2 2000-01-05 A 0.190112 0.380225
3 2000-01-03 B 1.111317 2.222634
4 2000-01-04 B -0.393556 -0.787113
5 2000-01-05 B -0.883954 -1.767909
6 2000-01-03 C -1.164946 -2.329893
7 2000-01-04 C 0.381472 0.762944
8 2000-01-05 C -0.501643 -1.003286
9 2000-01-03 D 0.305640 0.611279
10 2000-01-04 D -0.499032 -0.998064
11 2000-01-05 D 0.421693 0.843386
In [94]:

pivoted = df.pivot(‘date’, ‘variable’)
pivoted
Out[94]:
value value2
variable A B C D A B C D
date
2000-01-03 0.728665 1.111317 -1.164946 0.305640 1.457330 2.222634 -2.329893 0.611279
2000-01-04 -0.582155 -0.393556 0.381472 -0.499032 -1.164310 -0.787113 0.762944 -0.998064
2000-01-05 0.190112 -0.883954 -0.501643 0.421693 0.380225 -1.767909 -1.003286 0.843386
In [95]:

pivoted.stack()
Out[95]:
value value2
date variable
2000-01-03 A 0.728665 1.457330
B 1.111317 2.222634
C -1.164946 -2.329893
D 0.305640 0.611279
2000-01-04 A -0.582155 -1.164310
B -0.393556 -0.787113
C 0.381472 0.762944
D -0.499032 -0.998064
2000-01-05 A 0.190112 0.380225
B -0.883954 -1.767909
C -0.501643 -1.003286
D 0.421693 0.843386
9 GroupBy 分组
In [96]:

df = DataFrame({‘A’ : [‘foo’, ‘bar’, ‘foo’, ‘bar’,
‘foo’, ‘bar’, ‘foo’, ‘foo’],
‘B’ : [‘one’, ‘one’, ‘two’, ‘three’,
‘two’, ‘two’, ‘one’, ‘three’],
‘C’ : np.random.randn(8),
‘D’ : np.random.randn(8)})
df
Out[96]:
A B C D
0 foo one -0.025329 0.529539
1 bar one -0.555756 1.482785
2 foo two 1.422318 0.249187
3 bar three -1.560517 -1.510415
4 foo two -0.704955 0.454320
5 bar two -0.492307 0.777783
6 foo one 0.644247 2.450177
7 foo three -1.355120 -0.727310
In [97]:

for key, group in df.groupby(‘A’):
print (key, group)
bar      A      B         C         D
1  bar    one -0.555756  1.482785
3  bar  three -1.560517 -1.510415
5  bar    two -0.492307  0.777783
foo      A      B         C         D
0  foo    one -0.025329  0.529539
2  foo    two  1.422318  0.249187
4  foo    two -0.704955  0.454320
6  foo    one  0.644247  2.450177
7  foo  three -1.355120 -0.727310
In [98]:

df.groupby(‘A’)[‘C’].describe().T
Out[98]:
A
bar  count    3.000000
mean    -0.869527
std      0.599256
min     -1.560517
25%     -1.058137
50%     -0.555756
75%     -0.524032
max     -0.492307
foo  count    5.000000
mean    -0.003768
std      1.092237
min     -1.355120
25%     -0.704955
50%     -0.025329
75%      0.644247
max      1.422318
dtype: float64
In [99]:

df.groupby(‘A’).mean()
Out[99]:
C D
A
bar -0.869527 0.250051
foo -0.003768 0.591183
In [100]:

for key, group in df.groupby(‘A’):
print (key,group)
bar      A      B         C         D
1  bar    one -0.555756  1.482785
3  bar  three -1.560517 -1.510415
5  bar    two -0.492307  0.777783
foo      A      B         C         D
0  foo    one -0.025329  0.529539
2  foo    two  1.422318  0.249187
4  foo    two -0.704955  0.454320
6  foo    one  0.644247  2.450177
7  foo  three -1.355120 -0.727310
In [101]:

df.groupby([‘A’, ‘B’]).mean()
Out[101]:
C D
A B
bar one -0.555756 1.482785
three -1.560517 -1.510415
two -0.492307 0.777783
foo one 0.309459 1.489858
three -1.355120 -0.727310
two 0.358682 0.351753
In [102]:

df.groupby([‘A’, ‘B’], as_index=False).mean()
Out[102]:
A B C D
0 bar one -0.555756 1.482785
1 bar three -1.560517 -1.510415
2 bar two -0.492307 0.777783
3 foo one 0.309459 1.489858
4 foo three -1.355120 -0.727310
5 foo two 0.358682 0.351753
GroupBy 实例:分组线性回归
In [103]:

def get_beta(rets):
rets = rets.dropna()
rets[‘intercept’] = 1.
model = sm.OLS(rets[‘MSFT.US’], rets.ix[:, [‘AAPL.US’, ‘intercept’]]).fit()
return model.params
get_beta(rets)
Out[103]:
AAPL.US      0.055743
intercept    0.001386
dtype: float64
10 应用:移动窗口回归
In [104]:

df_us3= get_price([‘AAPL.US’,’IBM.US’,’MSFT.US’], start_date=’2001-01-01′, end_date=’2015-08-12′,country = ‘us’,fields=’ClosingPx’)
df_us3[‘AAPL.US’].plot()
Out[104]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd32035c9b0>

In [105]:

rets2 = df_us3/ df_us3.shift(1) – 1
rets2.head()

Out[105]:
AAPL.US IBM.US MSFT.US
MDEntryDate
2001-01-02 NaN NaN NaN
2001-01-03 0.100134 0.115670 0.105118
2001-01-04 0.042150 -0.015113 0.010430
2001-01-05 -0.040445 0.008692 0.014244
2001-01-08 0.011607 -0.004681 -0.003867
In [106]:

y = rets2[‘AAPL.US’]
x = rets2.ix[:, [‘MSFT.US’]]
model = pd.ols(y=y, x=x)
model
Out[106]:

————————-Summary of Regression Analysis————————-

Formula: Y ~ <MSFT.US> + <intercept>

Number of Observations:         3672
Number of Degrees of Freedom:   2

R-squared:         0.2008
Adj R-squared:     0.2006

Rmse:              0.0219

F-stat (1, 3670):   922.0308, p-value:     0.0000

Degrees of Freedom: model 1, resid 3670

———————–Summary of Estimated Coefficients————————
Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
——————————————————————————–
MSFT.US     0.5899     0.0194      30.36     0.0000     0.5519     0.6280
intercept     0.0014     0.0004       3.76     0.0002     0.0007     0.0021
———————————End of Summary———————————
In [107]:

model = pd.ols(y=y, x=x, window=250)
model.beta.info()
<class ‘pandas.core.frame.DataFrame’>
DatetimeIndex: 3423 entries, 2002-01-04 to 2015-08-10
Data columns (total 2 columns):
MSFT.US      3423 non-null float64
intercept    3423 non-null float64
dtypes: float64(2)
memory usage: 80.2 KB
In [108]:

model.beta[‘MSFT.US’].plot()
Out[108]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fd320fb9668>

In [109]:

def winsorize(data, std_level=3, window=250, min_periods=20):
result = data.copy()
std = pd.rolling_std(rets2, window, min_periods=min_periods)
cap_level = std_level * np.sign(data) * std
result[np.abs(data) > std_level * std] = cap_level
return result

winz = winsorize(rets2)
winz_model = pd.ols(y=winz[‘AAPL.US’], x=winz.ix[:, [‘MSFT.US’]],
window=250)
model.beta[‘MSFT.US’].plot(label=”With outliers”)
winz_model.beta[‘MSFT.US’].plot(label=”Winsorized”)
plt.legend(loc=’best’)

Out[109]:
<matplotlib.legend.Legend at 0x7fd3201cf2b0>

11 彩蛋
细心的童鞋会发现取出的数据如果默认的话,数据长度为一年,而且无论A股美股,都是从20130104到20140103,“20”如果是“爱你”,那么后面的0去掉,就是从“一生一世”到“一世一生”,爱到无穷无尽了。 其实该教程大体完成于七夕,主要的编写者作为单身狗虽然在发现了这个秘密之后受到了成吨的伤害,追问之后还发现,年少的美股数据部署者曾经提议,默认取截止到目前最新的一年,最后还是亲爱的CEO坚持将时间调到这个时间段。经过我的追问之后,部署美股数据的少年好像明白了什么,在七夕节坐着飞机和女票飞回美帝去了。 有人说,创业需要激情,是的,米筐的数据里也能藏有浪漫,祝浪漫的你们,七夕快乐!
单身狗

KBEngine是一款开源的游戏服务端引擎

KBEngine是一款开源的游戏服务端引擎,使用简单的约定协议就能够使客户端与服务端进行交互, 使用KBEngine插件能够快速与(Unity3D, OGRE, Cocos2d, HTML5, 等等)技术结合形成一个完整的客户端。
服务端底层框架使用c++编写,游戏逻辑层使用Python(支持热更新),开发者无需重复的实现一些游戏服务端通用的底层技术, 将精力真正集中到游戏开发层面上来,快速的打造各种网络游戏。kbengine底层架构被设计为多进程分布式动态负载均衡方案, 理论上只需要不断扩展硬件就能够不断增加承载上限,单台机器的承载上限取决于游戏逻辑本身的复杂度。

tushare继续学习

使用yahoo数据的画K线图例子:
#!/usr/bin/env python
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator,\
DayLocator, MONDAY
from matplotlib.finance import quotes_historical_yahoo_ohlc, candlestick_ohlc
# (Year, month, day) tuples suffice as args for quotes_historical_yahoo
date1 = (2015, 7, 1)
date2 = (2015, 8, 28)
mondays = WeekdayLocator(MONDAY)        # major ticks on the mondays
alldays = DayLocator()              # minor ticks on the days
weekFormatter = DateFormatter(‘%b %d’)  # e.g., Jan 12
dayFormatter = DateFormatter(‘%d’)      # e.g., 12

quotes = quotes_historical_yahoo_ohlc(‘600030.SS’, date1, date2)
print(quotes)
if len(quotes) == 0:
raise SystemExit

fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
#ax.xaxis.set_minor_formatter(dayFormatter)

#plot_day_summary(ax, quotes, ticksize=3)
candlestick_ohlc(ax, quotes, width=0.6)

ax.xaxis_date()
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45, horizontalalignment=’right’)

plt.show()

 

由于机器从win7 32位升级到64位(使用ghost直接升),所以anaconda又重新安装了一下,tushare毫无疑问重新安装了一下。

!pip install tushare
Collecting tushare
Downloading tushare-0.3.6.tar.gz
Installing collected packages: tushare
Running setup.py install for tushare
Successfully installed tushare-0.3.6
You are using pip version 7.0.3, however version 7.1.2 is available.
You should consider upgrading via the ‘python -m pip install –upgrade pip’ command.

 

练习矩阵操作:

r=np.random.standard_normal((4,3))

r=np.random.standard_normal((4,3))

r+s

r+3

练习matplotlib

import matplotlib as mpl

import matplotlib.pyplot as plt

%matplotlib inline

不明白上面第三句什么意思

开始画图:

 

np.random.seed(1000)

y=np.random.standard_normal(20)

x=range(len(y))

plt.plot(x,y)
Out[47]: [<matplotlib.lines.Line2D at 0x84124b0>]

那个就这样就画了一个最简单的图了。

那么怎么画K线图呢?我们试一下:

o=ts.get_hist_data(‘600030′)

o
Out[57]:
open   high  close    low      volume  price_change  p_change  date
2012-09-03  10.15  10.64  10.52  10.11   636278.81          0.30      2.94

首先通过tushare,拿到开盘数据,

然后画图:

plt.plot(range(728),o[‘open’])
Out[61]: [<matplotlib.lines.Line2D at 0x878dbb0>]

o.index是索引

当然x轴可以省略

plt.plot(o[‘open’])
Out[77]: [<matplotlib.lines.Line2D at 0x992cef0>]

 

加上网格,并且布满:

plt.plot(o[‘open’]);plt.axis(‘tight’);plt.grid(True)