请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3353613 新帖:3

【研究】量化选股——多因子模型

外汇工厂发表于:5 月 9 日 17:46回复(1)

见研究。

很长一串没有折叠起来的输出,是因为运算量比较大,中间打印了一下。

请自行忽略。。。

话说貌似好的策略可以赚零花钱。。所以大概只有我还在发研究贴了。。。

【研究】量化选股——多因子模型¶

喜欢做一名宽客,是因为可以自己掌握命运。   ——丁鹏

这两天研读丁鹏的《量化投资——策略与技术》。是一本入门类的书籍,也确实给出了很多实用的方法和指导。

下面分享其中多因子模型,以及一些实证研究的结果。


多因子模型¶

总体分为基本面选股、市场行为选股。基本面选股包括:多因子模型,风格轮动模型,行业轮动模型。市场行为选股包括:资金流选股,动量反转模型,一致预期模型,趋势追踪模型和筹码选股。

今天要讲的是多因子模型。

多因子选股模型是广泛应用的一种方法。采用一系列的因子作为选股标准,满足则买入,不满足则卖出。不同的市场时期总有一些因子在发挥作用,该模型相对来说比较稳定。

模型的优点是可以综合很多信息后给出一个选股结果。选取的因子不同以及如何综合各个因子得到最终判断的方法不同会产生不同的模型。一般来说,综合因子的方法有打分法和回归法两种,打分法较为常见。

模型构建实例¶

  1. 选取06-11年做样本期,进行因子检验和筛选。12-15年做OS-test(样本外检验),看回测效果。

  2. 股票选取上市时间超过1个季度的股票,benchmark = 000001.XSHG

一.备选因子选取¶

根据市场经验和经济逻辑选取。选择更多和更有效的因子能增强模型信息捕获能力。 如一些基本面指标(PB、PE、EPS、增长率),技术面指标(动量、换手率、波动),或其他指标(预期收益增长、分析师一致预期变化、宏观经济变量)。

结合JQ能提供的数据,具体选取以下三个方面的因子:

(1)估值:账面市值比(B/M)、盈利收益率(EPS)、动态市盈(PEG)

(2)成长性:ROE、ROA、主营毛利率(GP/R)、净利率(P/R)

(3)资本结构:资产负债(L/A)、固定资产比例(FAP)、流通市值(CMV)

下面就上述10个因子的有效性进行验证。

二.因子有效性检验¶

采用排序的方法检验备选因子的有效性。

对任一个因子,从第一个月月初计算市场每只股票该因子的大小,从小到大对样本股票池排序,平均分为n个组合,一直持有到月末。每月初用同样的方法调整股票池。运用一定样本时期的数据来建立模型。

0.导入所需库¶

import pandas as pdfrom pandas import Series, DataFrameimport numpy as npimport statsmodels.api as smimport scipy.stats as scsimport matplotlib.pyplot as plt

1.每月初取所有因子数值(以2015-01-01为例)¶

注:此处剔除流通市值大于500亿的股票,避免权重股造成的影响。示例中原193只股票,剔除掉13只。

(1)估值:账面市值比(B/M)、盈利收益率(EPS)、动态市盈(PEG)

(2)成长性:ROE、ROA、主营毛利率(GP/R)、净利率(P/R)

(3)资本结构:资产负债(L/A)、固定资产比例(FAP)、流通市值(CMV)

factors = ['B/M','EPS','PEG','ROE','ROA','GP/R','P/R','L/A','FAP','CMV']#月初取出因子数值def get_factors(fdate,factors):stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,income.basic_eps,valuation.pe_ratio,income.net_profit/balance.total_owner_equities,income.net_profit/balance.total_assets,income.total_profit/income.operating_revenue,income.net_profit/income.operating_revenue,balance.total_liability/balance.total_assets,balance.fixed_assets/balance.total_assets,valuation.circulating_market_cap).filter(valuation.code.in_(stock_set),valuation.circulating_market_cap)fdf = get_fundamentals(q, date=fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-10:]fdf = get_factors('2015-01-01',factors)fdf.head()

B/MEPSPEGROEROAGP/RP/RL/AFAPCMV
code









600000.XSHG0.8012320.65106.380.0524520.0031090.5216400.4002600.9407330.0024442341.3799
600004.XSHG0.6681920.190012.890.0283430.0226210.2382950.1702180.2019030.636765125.7000
600005.XSHG1.0451510.034066.090.0093160.0035320.0238460.0174530.6209120.586177361.3600
600006.XSHG0.6698080.031856.260.0086700.0034210.0184010.0171380.6054330.155339119.0000
600007.XSHG0.3323040.140036.470.0285000.0148950.3459100.2600730.4773690.127698154.0100

2.对每个因子按大小排序(以'B/M'为例)¶

score = fdf['B/M'].order()score.head()
code
600301.XSHG   -0.045989
600444.XSHG   -0.029723
600228.XSHG   -0.026231
600217.XSHG   -0.026090
600876.XSHG   -0.010862
Name: B/M, dtype: float64

股票池中股票数目

len(score)
966

3.按分值将股票池五等分,构造组合port1-5¶

startdate = '2015-01-01'enddate = '2015-02-01'nextdate = '2015-03-01'df = {}CMV = fdf['CMV']port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]port5 = list(score.index)[ -len(score)/5: ]
15066.599999999999

4.函数-计算组合月收益(按流通市值加权)¶

def caculate_port_monthly_return(port,startdate,enddate,nextdate,CMV):close1 = get_price(port, startdate, enddate, 'daily', ['close'])close2 = get_price(port, enddate, nextdate, 'daily',['close'])weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)*CMV).sum()/(CMV.ix[port].sum()) #     weighted_m_return = (close['close'].ix[-1,:]/close['close'].ix[0,:]-1).mean()return weighted_m_returncaculate_port_monthly_return(port1,'2015-01-01','2015-02-01','2015-03-01',fdf['CMV'])
0.042660461430416276

5.函数-计算benchmark月收益¶

def caculate_benchmark_monthly_return(startdate,enddate,nextdate):close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()return benchmark_returncaculate_benchmark_monthly_return('2015-01-01','2015-02-01','2015-03-01')
-0.06632375461831419

6.观察5个组合在2015-01-01日构建起一个月内的收益情况¶

benchmark_return = caculate_benchmark_monthly_return(startdate,enddate,nextdate)df['port1'] =  caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df['port2'] = caculate_port_monthly_return(port2,startdate,enddate,nextdate,CMV)df['port3'] = caculate_port_monthly_return(port3,startdate,enddate,nextdate,CMV)df['port4'] = caculate_port_monthly_return(port4,startdate,enddate,nextdate,CMV)df['port5'] = caculate_port_monthly_return(port5,startdate,enddate,nextdate,CMV)print Series(df)print 'benchmark_return %s'%benchmark_return
port1    0.042660
port2   -0.047200
port3    0.012783
port4   -0.063027
port5   -0.117817
dtype: float64
benchmark_return -0.0663237546183

7.构建因子组合并计算每月换仓时不同组合的月收益率¶

数据范围:2009-2015共7年

得到结果monthly_return为panel数据,储存所有因子,在7×12个月内5个组合及benchmark的月收益率

factors = ['B/M','EPS','PEG','ROE','ROA','GP/R','P/R','L/A','FAP','CMV']#因为研究模块取fundmental数据默认date为研究日期的前一天。所以要自备时间序列。按月取year = ['2009','2010','2011','2012','2013','2014','2015']month = ['01','02','03','04','05','06','07','08','09','10','11','12']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2016-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2016-01-01':nextdate = '2016-02-01'else:nextdate = '2016-01-01'print 'time %s'%startdatefdf = get_factors(startdate,factors)CMV = fdf['CMV']#5个组合,10个因子df = DataFrame(np.zeros(6*10).reshape(6,10),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)for fac in factors:score = fdf[fac].order()port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]port5 = list(score.index)[ -len(score)/5+1: ]df.ix['port1',fac] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df.ix['port2',fac] = caculate_port_monthly_return(port2,startdate,enddate,nextdate,CMV)df.ix['port3',fac] = caculate_port_monthly_return(port3,startdate,enddate,nextdate,CMV)df.ix['port4',fac] = caculate_port_monthly_return(port4,startdate,enddate,nextdate,CMV)df.ix['port5',fac] = caculate_port_monthly_return(port5,startdate,enddate,nextdate,CMV)df.ix['benchmark',fac] = caculate_benchmark_monthly_return(startdate,enddate,nextdate)print 'factor %s'%facresult[i+1]=dfmonthly_return = pd.Panel(result)
time 2009-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2009-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2010-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2011-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2012-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2013-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2014-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-01-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-02-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-03-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-04-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-05-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-06-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-07-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-08-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-09-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-10-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-11-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV
time 2015-12-01
factor B/M
factor EPS
factor PEG
factor ROE
factor ROA
factor GP/R
factor P/R
factor L/A
factor FAP
factor CMV

8.取某个因子的5个组合收益情况('L/A'为例)¶

monthly_return[:,:,'L/A']

12345678910...75767778798081828384
port10.0853770.0364370.1710640.0681390.0313530.0711920.151097-0.1874320.1067540.053656...0.1602940.1750910.184501-0.173977-0.126870-0.1297390.0206400.0926770.059210-0.024342
port20.1031120.0515960.1876900.0869210.0233870.0602040.174042-0.2073560.0827870.068394...0.1081890.1560540.061243-0.171119-0.069029-0.126619-0.0089850.0421850.040142-0.059092
port30.1478360.0471560.1801900.0781120.0431210.0767990.247551-0.2566590.0920750.058820...0.1356610.1892770.101805-0.176225-0.131486-0.1161740.0038470.0594880.023647-0.034763
port40.1281600.0533840.2063320.0517140.0608880.0755270.181603-0.2340780.0969600.089126...0.1865790.2519440.131996-0.204895-0.156339-0.126159-0.0024440.0583230.032130-0.083936
port50.0870020.0423830.1769840.0535080.0758920.2108370.127438-0.2385310.1062690.077865...0.1236810.153539-0.017625-0.102327-0.066117-0.107363-0.0251640.0426630.037691-0.036141
benchmark0.0696370.0406450.1502640.0630780.0630370.1054170.151070-0.2249370.0849530.056645...0.1420770.1758840.077732-0.160505-0.106272-0.125943-0.0073480.0578130.039465-0.046307

6 rows × 84 columns

(monthly_return[:,:,'L/A'].T+1).cumprod().tail()

port1port2port3port4port5benchmark
802.3331382.1061091.9352002.0704562.3845741.683733
812.3812942.0871851.9426452.0653962.3245691.671362
822.6019842.1752332.0582082.1858552.4237431.767989
832.7560492.2625522.1068792.2560862.5150961.837762
842.6889612.1288532.0336382.0667202.4241981.752661

9.因子检验量化指标¶

模型建立后,计算n个组合的年化复合收益、超额收益、不同市场情况下高收益组合跑赢benchmark和低收益组合跑输benchmark的概率

检验有效性的量化标准:

(1)序列1-n的组合,年化复合收益应满足一定排序关系,即组合因子大小与收益具有较大相关关系。假定序列i的组合年化收益为Xi,则Xi与i的相关性绝对值Abs(Corr(Xi,i))>MinCorr。此处MinCorr为给定的最小相关阀值。

(2)序列1和n表示的两个极端组合超额收益分别为AR1、ARn。MinARtop、MinARbottom表示最小超额收益阀值。

if AR1 > ARn #因子越小,收益越大

则应满足AR1 > MinARtop >0 and ARn < MinARbottom < 0

if AR1 < ARn #因子越小,收益越小

则应满足ARn > MinARtop >0 and AR1 < MinARbottom < 0

以上条件保证因子最大和最小的两个组合,一个明显跑赢市场,一个明显跑输市场。

(3) 在任何市场行情下,1和n两个极端组合,都以较高概率跑赢or跑输市场。

以上三个条件,可以选出过去一段时间有较好选股能力的因子。

total_return = {}annual_return = {}excess_return = {}win_prob = {}loss_prob = {}effect_test = {}MinCorr = 0.3Minbottom = -0.05Mi*p = 0.05for fac in factors:effect_test[fac] = {}monthly = monthly_return[:,:,fac]total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1annual_return[fac] = (total_return[fac]+1)**(1./6)-1excess_return[fac] = annual_return[fac]- annual_return[fac][-1]#判断因子有效性#1.年化收益与组合序列的相关性 大于 阀值effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))#2.高收益组合跑赢概率#因子小,收益小,port1是输家组合,port5是赢家组合if total_return[fac][0] < total_return[fac][-2]:loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]#因子小,收益大,port1是赢家组合,port5是输家组合else:loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]#effect_test[1]记录因子相关性,>0.5或<-0.5合格#effect_test[2]记录【赢家组合超额收益,输家组合超额收益】#effect_test[3]记录赢家组合跑赢概率和输家组合跑输概率。【>0.5,>0.4】合格(因实际情况,跑输概率暂时不考虑)DataFrame(effect_test)

B/MCMVEPSFAPGP/RL/AP/RPEGROAROE
10.8575994-0.9803264-0.2417111-0.76496260.4933612-0.31507640.7777911-0.97184490.4407980.33929
2[8.20404772765, 0.444474144989][48.222405245, 0.73877402636][5.37013402955, 3.7224243677][6.81859530015, -1.43912556501][5.19660373631, 3.03172671649][8.11916063464, 6.0994646067][5.50314924722, 2.78928297967][10.1571258275, -1.27862288343][6.16875242846, 1.86261189929][5.89690556199, 2.87424422408]
3[0.583333333333, 0.47619047619][0.714285714286, 0.47619047619][0.547619047619, 0.47619047619][0.52380952381, 0.511904761905][0.571428571429, 0.47619047619][0.630952380952, 0.47619047619][0.559523809524, 0.47619047619][0.595238095238, 0.464285714286][0.595238095238, 0.47619047619][0.535714285714, 0.5]

检验结果,同时满足上述三个条件的5个有效因子:

(1)估值:账面市值比(B/M)、盈利收益率(EPS)、动态市盈(PEG)

(2)成长性:ROE、ROA、主营毛利率(GP/R)、净利率(P/R)

(3)资本结构:资产负债(L/A)、固定资产比例(FAP)流通市值(CMV)

其中:CMV,FAP,PEG三个因子越小收益越大;B/M,P/R越大收益越大

(1)有效因子的总收益和年化收益¶

小市值妖孽!!按CMV因子排序时,CMV小的组合总收益14.6倍,年化58%!

总收益第二名是FAP的port2,达到2.71倍。(这也是造成FAP组合收益相关性稍低的原因)

effective_factors = ['B/M','PEG','P/R','FAP','CMV']DataFrame(total_return).ix[:,effective_factors]

B/MPEGP/RFAPCMV
port10.7956621.9801161.0373431.51585614.572930
port20.4918671.2708211.2227592.7160786.133603
port30.8585361.1354011.1349421.5305953.234002
port41.5371220.9168001.5290770.8051812.061771
port51.7005950.6337171.3503200.6192730.824615
benchmark0.7526610.7526610.7526610.7526610.752661
DataFrame(annual_return).ix[:,effective_factors]

B/MPEGP/RFAPCMV
port10.1024800.1996070.1259280.1662210.580259
port20.0689440.1464730.1423930.2445550.387453
port30.1088220.1347840.1347430.1673570.271916
port40.1678580.1145410.1672410.1034520.205023
port50.1800760.0852490.1530670.0836440.105423
benchmark0.0980350.0980350.0980350.0980350.098035

(2)有效因子组合和benchmark收益率展示¶

def draw_return_picture(df):plt.figure(figsize =(10,4))plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')plt.xlabel('return of factor %s'%fac)plt.legend(loc=0)for fac in effective_factors:draw_return_picture(monthly_return[:,:,fac])

3.冗余因子的剔除(仅给出思路,此处因子较少不做这一步)¶

有些因子,因为内在的逻辑比较相近等原因,选出来的组合在个股构成和收益等方面相关性较高。所以要对这些因子做冗余剔除,保留同类因子中收益最好、区分度最高的因子。具体步骤:

(1)对不同因子的n个组合打分。收益越大分值越大。分值达到好将分值赋给每月该组合内的所有个股。

if AR1 > ARn #因子越小,收益越大

则组合i的分值为(n-i+1)

if AR1 < ARn #因子越小,收益越小

则组合i的分值为i

(2)按月计算个股不同因子得分的相关性矩阵。得到第t月个股的因子得分相关性矩阵Score_Corrt,u,v。u,v为因子序号。

(3)计算样本期内相关性矩阵的平均值。即样本期共m个月,加总矩阵后取1/m。

(4)设定得分相关性阀值MinScoreCorr。只保留与其他因子相关性较小的因子。

4.模型建立和选股¶

根据选好的有效因子,每月初对市场个股计算因子得分,按一定权重求得所有因子的平均分。如遇因子当月无取值时,按剩下的因子分值求加权平均。通过对个股的加权平均得分进行排序,选择排名靠前的股票交易。

以下代码段等权重对因子分值求和,选出分值最高的股票进行交易。

(1)模型构建¶

def score_stock(fdate):#CMV,FAP,PEG三个因子越小收益越大,分值越大,应降序排;B/M,P/R越大收益越大应顺序排effective_factors = {'B/M':True,'PEG':False,'P/R':True,'FAP':False,'CMV':False}fdf = get_factors(fdate)score = {}for fac,value in effective_factors.items():score[fac] = fdf[fac].rank(ascending = value,method = 'first')print DataFrame(score).T.sum().order(ascending = False).head(5)score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)return score_stock,fdf['CMV']def get_factors(fdate):factors = ['B/M','PEG','P/R','FAP','CMV']stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,valuation.pe_ratio,income.net_profit/income.operating_revenue,balance.fixed_assets/balance.total_assets,valuation.circulating_market_cap).filter(valuation.code.in_(stock_set))fdf = get_fundamentals(q,date = fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-5:][score_result,CMV] = score_stock('2016-01-01')
code
600382.XSHG    4274
600638.XSHG    4224
600291.XSHG    4092
600791.XSHG    4078
600284.XSHG    4031
dtype: float64
year = ['2009','2010','2011','2012','2013','2014','2015']month = ['01','02','03','04','05','06','07','08','09','10','11','12']factors = ['B/M','PEG','P/R','FAP','CMV']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2016-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2016-01-01':nextdate = '2016-02-01'else:nextdate = '2016-01-01'print 'time %s'%startdate#综合5个因子打分后,划分几个组合df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])[score,CMV] = score_stock(startdate)port0 = score[:20]port1 = score[: len(score)/5]port2 = score[ len(score)/5+1: 2*len(score)/5]port3 = score[ 2*len(score)/5+1: -2*len(score)/5]port4 = score[ -2*len(score)/5+1: -len(score)/5]port5 = score[ -len(score)/5+1: ]print len(score)df.ix['Top20'] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df.ix['port1'] = caculate_port_monthly_return(port1,startdate,enddate,nextdate,CMV)df.ix['port2'] = caculate_port_monthly_return(port2,startdate,enddate,nextdate,CMV)df.ix['port3'] = caculate_port_monthly_return(port3,startdate,enddate,nextdate,CMV)df.ix['port4'] = caculate_port_monthly_return(port4,startdate,enddate,nextdate,CMV)df.ix['port5'] = caculate_port_monthly_return(port5,startdate,enddate,nextdate,CMV)df.ix['benchmark'] = caculate_benchmark_monthly_return(startdate,enddate,nextdate)result[i+1]=dfbacktest_results = pd.DataFrame(result)

5.不足和改进¶

随着模型使用人数的增加,有的因子会逐渐失效,也可能出现一些新的因素需要加入到因子库中。同时,各因子的权重设计有进一步改进空间。模型本身需要做持续的再评价,并不断改进来适应市场的变化。

最后¶

这篇研究作为纯粹的个人兴趣,按照书中的方法复现了一遍(书中使用数据2005年01-2010年12)。

验证有效的因子怎么组成策略,下一次再分享~

全部回复

0/140

量化课程

    移动端课程