请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3352691 新帖:42

多因子选股模型的学习

蜡笔小新炒外汇发表于:5 月 10 日 05:41回复(1)

本文介绍了多因子选股模型的具体步骤,首先进行因子选取,对因子值进行排序,计算收益情况,然后通过量化指标筛选因子,剔除冗余因子,最后对于筛选得到的因子进行交易,再次计算收益率并进行对比。

多因子选股模型¶

多因子选股模型采用一系列的因子作为选股标准,满足标准则买入,反之卖出。由于时间的推移,部分因子的效果逐渐被另一些因子取代,所以有必要在一段时间后更新选股因子。其基本形式如下:

$$\bar{K}_{it} = a_i + b_{i1}\delta_{1t} + b_{i2} \delta_{2t} + ... + b_{ik}\delta_{it} + e_{it} \\$$

其中,$\delta_{kt}$代表第k个风险因素在t时期的意外变化;$b_{ik}$代表资产i对第k个风险因素的敏感系数。

模型构建及因子选取¶

时间选取10-16年作为样本期,并进行因子筛选及检验,基准仍选取上证综指。选取因子时,在查阅一些其它相关资料(如:光大证券2012-09-10:数量化投资:体系与策略、长城证券2010-02-11:基于价值与成长的静态选股模型等)后,拟选取以下四个方面的因子:

  1. 价值类因子:市盈率(PE),市净率(PB),市销率(PS),基本每股收益(EPS),账面市值比(B/M)

  2. 成长类因子:净资产收益率(ROE),总资产净利率(ROA),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),净利润环比增长率(inc_net_profit_annual),营业利润同比增长率(inc_operation_profit_year_on_year),营业利润环比增长率(inc_operation_profit_annual),主营毛利率(GP/R)、净利率(P/R)

  3. 规模类因子:净利润(net_profit),营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)、固定资产比例(FAP)

  4. 交投类因子:换手率(turnover_ratio) 采用排序法对因子的有效性进行验证。

import pandas as pdfrom pandas import Series, DataFrameimport numpy as npimport statsmodels.api as smimport scipy.stats as scsimport matplotlib.pyplot as plt

月初取出所有因子数值,例如2017-01-01:

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',   'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',   'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap', 'L/A', 'FAP',   'turnover_ratio']# 月初取出因子值def get_factors(fdate, factors):stock_set = get_index_stocks('000001.XSHG', fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,valuation.pe_ratio,valuation.pb_ratio,valuation.ps_ratio,income.basic_eps,indicator.roe,indicator.roa,indicator.gross_profit_margin,indicator.inc_net_profit_year_on_year,indicator.inc_net_profit_annual,indicator.inc_operation_profit_year_on_year,indicator.inc_operation_profit_annual,income.total_profit/income.operating_revenue,income.net_profit/income.operating_revenue,income.net_profit,income.operating_revenue,valuation.capitalization,valuation.circulating_cap,valuation.market_cap,valuation.circulating_market_cap,balance.total_liability/balance.total_assets,balance.fixed_assets/balance.total_assets,valuation.turnover_ratio).filter(valuation.code.in_(stock_set),valuation.circulating_market_cap)fdf = get_fundamentals(q, date=fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-23:]fdf = get_factors('2017-01-01', factors)fdf.head().T
code600000.XSHG600004.XSHG600005.XSHG600006.XSHG600007.XSHG
PE1.040447e+006.663323e-018.453689e-015.430593e-013.448389e-01
PB6.470000e+001.171000e+01-5.600000e+005.742000e+012.448000e+01
PS9.700000e-011.520000e+001.200000e+002.110000e+002.900000e+00
EPS2.210000e+002.720000e+006.600000e-018.700000e-017.440000e+00
B/M6.440000e-013.100000e-011.000000e-022.400000e-031.700000e-01
ROE3.945300e+003.462200e+003.394000e-017.350000e-022.896200e+00
ROA2.578000e-012.155400e+001.018000e-01-1.064000e-011.598700e+00
gross_profit_margin0.000000e+003.974840e+017.558700e+001.333970e+014.975910e+01
inc_net_profit_year_on_year5.610900e+001.526610e+011.064834e+02-1.233078e+026.576800e+00
inc_net_profit_annual8.404500e+002.512800e+00-5.897230e+01-1.375576e+02-1.480590e+01
inc_operation_profit_year_on_year6.548100e+001.097220e+011.107581e+02-5.131879e+039.884500e+00
inc_operation_profit_annual8.729000e+00-1.867000e-01-7.276500e+00-1.478996e+02-1.347520e+01
GP/R4.739137e-013.112564e-011.478402e-02-9.276990e-033.949284e-01
P/R3.637630e-012.333095e-017.027120e-03-5.862390e-032.963650e-01
net_profit1.409800e+103.616162e+089.881854e+07-2.104446e+071.714052e+08
operating_revenue3.875600e+101.549942e+091.406245e+103.589742e+095.783583e+08
capitalization2.161828e+061.150030e+051.009378e+062.000000e+051.007280e+05
circulating_cap2.051882e+061.150030e+051.009378e+062.000000e+051.007280e+05
market_cap3.504320e+031.620400e+023.442000e+021.376000e+021.739600e+02
circulating_market_cap3.326100e+031.620400e+023.442000e+021.376000e+021.739600e+02
L/A9.344738e-013.627758e-016.990730e-016.202930e-014.433099e-01
FAP3.769940e-034.203185e-014.743858e-011.513520e-011.019044e-01
turnover_ratio6.000000e-021.700000e-013.400000e-013.800000e-012.100000e-01

对每个因子大小排序(以流通市值为例)¶

score = fdf['circulating_market_cap'].order()score.head()
code
603822.XSHG    11.18
603031.XSHG    11.26
603029.XSHG    11.46
603090.XSHG    11.92
603726.XSHG    12.10
Name: circulating_market_cap, dtype: float64

股票个数¶

len(score)
1131

按照流通市值将股票池进行五等分¶

startdate = '2017-01-01'enddate = '2017-02-01'nextdate = '2017-03-01'df = {}circulating_market_cap = fdf['circulating_market_cap']port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5: -len(score)/5]port5 = list(score.index)[ -len(score)/5: ]

按流通市值加权计算组合月收益(例如2017-01,2017-02月收益)¶

def calculate_port_monthly_return(port, startdate, enddate, nextdate, circulating_market_cap):close1 = get_price(port, startdate, enddate, 'daily', ['close'])close2 = get_price(port, enddate, nextdate, 'daily', ['close'])weighted_m_return = ((close2['close'].ix[0,:]/close1['close'].ix[0,:]-1)* circulating_market_cap).sum()/(circulating_market_cap.ix[port].sum())return weighted_m_returncalculate_port_monthly_return(port1, '2017-01-01', '2017-02-01', '2017-03-01', fdf['circulating_market_cap'])
-0.06404979959969452

计算基准月收益¶

def calculate_benchmark_monthly_return(startdate, enddate, nextdate):close1 = get_price(['000001.XSHG'],startdate,enddate,'daily',['close'])['close']close2 = get_price(['000001.XSHG'],enddate, nextdate, 'daily',['close'])['close']benchmark_return = (close2.ix[0,:]/close1.ix[0,:]-1).sum()return benchmark_returncalculate_benchmark_monthly_return('2017-01-01','2017-02-01','2017-03-01')
0.001355008710679284

观察5个组合在2017年初一个月内的收益情况¶

从结果可以看出,在构建因子组合之前,前四组的收益跑输大盘。

benchmark_return = calculate_benchmark_monthly_return('2017-01-01', '2017-02-01', '2017-03-01')df['port1'] = calculate_port_monthly_return(port1,'2017-01-01', '2017-02-01', '2017-03-01', fdf['circulating_market_cap'])df['port2'] = calculate_port_monthly_return(port2,'2017-01-01', '2017-02-01', '2017-03-01', fdf['circulating_market_cap'])df['port3'] = calculate_port_monthly_return(port3,'2017-01-01', '2017-02-01', '2017-03-01', fdf['circulating_market_cap'])df['port4'] = calculate_port_monthly_return(port4,'2017-01-01', '2017-02-01', '2017-03-01', fdf['circulating_market_cap'])df['port5'] = calculate_port_monthly_return(port5,'2017-01-01', '2017-02-01', '2017-03-01', fdf['circulating_market_cap'])print Series(df)print 'benchmark_return %s'%benchmark_return
port1   -0.064050
port2   -0.036469
port3   -0.027358
port4   -0.017350
port5    0.022849
dtype: float64
benchmark_return 0.00135500871068

构建因子组合,计算不同组合月收益率¶

时间:2010-2016年,计算1-5组以及benchmark组合的月收益率,形成84×6的面板数据。(运行时间较长,如果担心运行错误,可以取消两条print指令的注释)

factors = ['PE', 'PB', 'PS', 'EPS', 'B/M',   'ROE', 'ROA', 'gross_profit_margin', 'inc_net_profit_year_on_year', 'inc_net_profit_annual', 
                     'inc_operation_profit_year_on_year', 'inc_operation_profit_annual', 'GP/R', 'P/R',   'net_profit', 'operating_revenue', 'capitalization', 'circulating_cap', 'market_cap', 'circulating_market_cap', 'L/A', 'FAP',   'turnover_ratio']#因为研究模块取fundamental数据默认date为研究日期的前一天。所以要自备时间序列。按月取year = ['2010','2011','2012','2013','2014','2015','2016']month = ['01','02','03','04','05','06','07','08','09','10','11','12']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2017-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2017-01-01':nextdate = '2017-02-01'else:nextdate = '2017-01-01'# print 'time %s'%startdatefdf = get_factors(startdate,factors)CMV = fdf['circulating_market_cap']#5个组合,23个因子df = DataFrame(np.zeros(6*23).reshape(6,23),index = ['port1','port2','port3','port4','port5','benchmark'],columns = factors)for fac in factors:score = fdf[fac].order()port1 = list(score.index)[: len(score)/5]port2 = list(score.index)[ len(score)/5+1: 2*len(score)/5]port3 = list(score.index)[ 2*len(score)/5+1: -2*len(score)/5]port4 = list(score.index)[ -2*len(score)/5+1: -len(score)/5]port5 = list(score.index)[ -len(score)/5+1: ]df.ix['port1',fac] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)df.ix['port2',fac] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)df.ix['port3',fac] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)df.ix['port4',fac] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)df.ix['port5',fac] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)df.ix['benchmark',fac] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)# print 'factor %s'%facresult[i+1]=dfmonthly_return = pd.Panel(result)

取某个因子的5个组合月收益情况(例如市盈率PE)¶

monthly_return[:,:,'PE']

12345678910...75767778798081828384
port1-0.0444790.0614390.046949-0.068926-0.051328-0.0960850.1581610.0774590.1050600.037388...0.1352180.026815-0.0480270.035096-0.0161230.0426470.0274730.0347090.006601-0.008979
port2-0.0457370.0726050.021550-0.066656-0.081321-0.1016970.1602730.0535000.1145450.098452...0.1216120.014305-0.0248430.0476110.0204640.029745-0.0004310.0346430.019016-0.026605
port3-0.0886070.0659450.036429-0.086631-0.105264-0.0555110.1531680.0116910.0823380.130798...0.1311310.016001-0.0228680.0247130.0261330.0455730.0162490.0179750.035158-0.041963
port4-0.0936490.0574280.025315-0.124566-0.099594-0.0578680.126684-0.0103300.0343110.120384...0.134579-0.019682-0.0274880.0004200.0189030.0452410.0024790.0235410.062909-0.039054
port5-0.0683320.0496720.024405-0.098798-0.074960-0.0584590.147077-0.0484740.0122950.099168...0.085794-0.006273-0.0175700.0028410.0335940.040802-0.0057700.0272430.072784-0.030624
benchmark-0.0932250.0498010.019293-0.099173-0.094169-0.0757280.125843-0.0185720.0441740.115117...0.101113-0.005611-0.0264430.0065100.0071300.037218-0.0049500.0243730.048319-0.041972

6 rows × 84 columns

总收益情况¶

(monthly_return[:,:,'PE'].T+1).cumprod().tail()

port1port2port3port4port5benchmark
802.2822911.6222961.6638421.5138881.6982860.944369
812.3449931.6215971.6908781.5176401.6884860.939694
822.4263841.6777741.7212711.5533671.7344850.962598
832.4424011.7096791.7817881.6510881.8607281.009110
842.4204721.6641941.7070191.5866061.8037450.966755

因子检验量化指标¶

模型建立后,计算n个组合的年化复合收益、超额收益、不同市场情况下高收益组合跑赢benchmark和低收益组合跑输benchmark的概率。

检验有效性的量化标准:

(1)序列1-n的组合,年化复合收益应满足一定排序关系,即组合因子大小与收益具有较大相关关系。假定序列i的组合年化收益为Xi,则Xi与i的相关性绝对值Abs(Corr(Xi,i))>MinCorr。此处MinCorr为给定的最小相关阈值。

(2)序列1和n表示的两个极端组合超额收益分别为AR1、ARn。MinARtop、MinARbottom表示最小超额收益阈值。 if AR1 > ARn #因子越小,收益越大 则应满足AR1 > MinARtop >0 and ARn < MinARbottom < 0 if AR1 < ARn #因子越小,收益越大 则应满足ARn > MinARtop >0 and AR1 < MinARbottom < 0 以上条件保证因子最大和最小的两个组合,一个明显跑赢市场,一个明显跑输市场。

(3)在任何市场行情下,1和n两个极端组合,都以较高概率跑赢或跑输市场。 以上三个条件,可以选出过去一段时间有较好选股能力的因子。

因为开始选择的因子较多,因此三条量化标准的选择更加严格,具体标准见程序中。

total_return = {}annual_return = {}excess_return = {}win_prob = {}loss_prob = {}effect_test = {}MinCorr = 0.3Minbottom = -0.05Mi*p = 0.05for fac in factors:effect_test[fac] = {}monthly = monthly_return[:,:,fac]total_return[fac] = (monthly+1).T.cumprod().iloc[-1,:]-1annual_return[fac] = (total_return[fac]+1)**(1./6)-1excess_return[fac] = annual_return[fac]- annual_return[fac][-1]#判断因子有效性#1.年化收益与组合序列的相关性 大于 阈值effect_test[fac][1] = annual_return[fac][0:5].corr(Series([1,2,3,4,5],index = annual_return[fac][0:5].index))#2.高收益组合跑赢概率#因子小,收益小,port1是输家组合,port5是赢家组合if total_return[fac][0] < total_return[fac][-2]:loss_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][-2]*100,excess_return[fac][0]*100]#因子小,收益大,port1是赢家组合,port5是输家组合else:loss_excess = monthly.iloc[-2,:]-monthly.iloc[-1,:]loss_prob[fac] = loss_excess[loss_excess<0].count()/float(len(loss_excess))win_excess = monthly.iloc[0,:]-monthly.iloc[-1,:]win_prob[fac] = win_excess[win_excess>0].count()/float(len(win_excess))effect_test[fac][3] = [win_prob[fac],loss_prob[fac]]#超额收益effect_test[fac][2] = [excess_return[fac][0]*100,excess_return[fac][-2]*100]#由于选择的因子较多,test标准选取适当严格一些#effect_test[1]记录因子相关性,>0.7或<-0.7合格#effect_test[2]记录【赢家组合超额收益,输家组合超额收益】#effect_test[3]记录赢家组合跑赢概率和输家组合跑输概率。【>0.6,>0.4】合格 (因实际情况,跑输概率暂时不考虑)DataFrame(effect_test).T

123
B/M-0.759392[14.8889455912, 10.5424829262][0.642857142857, 0.297619047619]
EPS0.4568019[14.4489063373, 11.2093245502][0.702380952381, 0.333333333333]
FAP-0.5571287[10.5741525762, 7.34302768254][0.619047619048, 0.428571428571]
GP/R-0.6232487[14.8192928557, 9.94927506981][0.666666666667, 0.392857142857]
L/A-0.7241624[20.4592189128, 9.55221567554][0.72619047619, 0.416666666667]
P/R-0.8633413[14.9970369431, 10.1663694923][0.642857142857, 0.369047619048]
PB0.531669[14.0238015397, 12.4638070002][0.630952380952, 0.380952380952]
PE-0.6031528[16.4352119763, 10.8924841904][0.595238095238, 0.392857142857]
PS0.2740354[11.7462917987, 11.3558352142][0.619047619048, 0.404761904762]
ROA-0.05878979[14.6409328506, 14.0019469495][0.619047619048, 0.27380952381]
ROE-0.6970631[15.1888383995, 12.0139372189][0.654761904762, 0.357142857143]
capitalization-0.9624119[33.3457013235, 5.98100579533][0.642857142857, 0.440476190476]
circulating_cap-0.9624839[34.1366707656, 5.98916236055][0.619047619048, 0.428571428571]
circulating_market_cap-0.9492392[54.1147457177, 6.33609169734][0.75, 0.380952380952]
gross_profit_margin0.7725194[20.6033749043, 7.69229793354][0.738095238095, 0.404761904762]
inc_net_profit_annual0.09843581[15.430326656, 14.5382452203][0.702380952381, 0.333333333333]
inc_net_profit_year_on_year0.7840669[14.7634556331, 8.55897646992][0.702380952381, 0.357142857143]
inc_operation_profit_annual0.2002443[15.3475247924, 14.2983923996][0.630952380952, 0.297619047619]
inc_operation_profit_year_on_year0.8898592[15.7572781469, 7.17341679029][0.702380952381, 0.369047619048]
market_cap-0.9530109[51.2689915553, 6.58323561057][0.75, 0.404761904762]
net_profit-0.5801736[15.1485653736, 7.84381665989][0.630952380952, 0.416666666667]
operating_revenue-0.9820981[33.1084788177, 6.85833849383][0.666666666667, 0.416666666667]
turnover_ratio0.133174[12.9535555362, 7.06170225457][0.5, 0.380952380952]

同时满足上述三个条件的有: (1)价值类因子:市盈率(B/M) (2)成长类因子:主营毛利率(P/R),销售毛利率(gross_profit_margin),净利润同比增长率(inc_net_profit_year_on_year),营业利润同比增长率( inc_operation_profit_year_on_year) (3)规模类因子:营业收入(operating_revenue),总股本(capitalization),流通股本(circulating_cap),总市值(market_cap),流通市值(circulating_market_cap),资产负债(L/A)

有效因子总收益及年化收益¶

按照circulating_market_cap因子排序时,在benchmark收益为负的背景下,port1(circulating_market_cap因子小)的组合收益率为12.1倍,年化收益53%,此外capitalization,circulating_cap,operating_revenue,market_cap四个因子也有组合收益率大于3倍,年化收益大于20%的优秀表现。

可以认为,如果一开始选取的因子较多较全面,经过后续筛选后预计会得到更优秀,更适合运用于当前时间段的因子,从而得到更高的收益率。

effective_factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']DataFrame(total_return).ix[:,effective_factors].T

port1port2port3port4port5benchmark
B/M1.2330241.7824901.0872700.6887290.769684-0.033245
L/A1.9706840.7943990.7016150.8947490.676204-0.033245
P/R1.2457220.9350800.9729090.9656940.733681-0.033245
capitalization4.4811312.5306091.8646881.6756420.372510-0.033245
circulating_cap4.6799722.8221281.3806371.5407500.373147-0.033245
circulating_market_cap12.1083153.7368071.6876521.1172000.400483-0.033245
gross_profit_margin0.5117360.9200691.0095000.7220091.992178-0.033245
inc_net_profit_year_on_year0.5866150.6528140.5930571.5849111.218358-0.033245
inc_operation_profit_year_on_year0.4683320.5276690.6563171.5568381.336737-0.033245
market_cap10.7166053.1108612.3051700.9826240.420231-0.033245
operating_revenue4.4226402.3350461.5185571.1532830.442487-0.033245

有效因子年化收益¶

DataFrame(annual_return).ix[:,effective_factors].T

port1port2port3port4port5benchmark
B/M0.1432700.1859660.1304810.0912560.099806-0.005619
L/A0.1989730.1023510.0926400.1123940.089903-0.005619
P/R0.1443510.1163060.1199140.1192300.096044-0.005619
capitalization0.3278380.2339800.1917350.1782520.054191-0.005619
circulating_cap0.3357480.2504050.1555330.1681370.054272-0.005619
circulating_market_cap0.5355280.2959280.1791310.1331660.057742-0.005619
gross_profit_margin0.0713040.1148580.1233490.0948110.200415-0.005619
inc_net_profit_year_on_year0.0799710.0873530.0807000.1714960.142015-0.005619
inc_operation_profit_year_on_year0.0661150.0731770.0877370.1693660.151954-0.005619
market_cap0.5070710.2656750.2204850.1208310.060213-0.005619
operating_revenue0.3254660.2223160.1664300.1363620.062964-0.005619

下面是各个因子6组收益的时间序列图:

def draw_return_picture(df):plt.figure(figsize =(10,4))plt.plot((df.T+1).cumprod().ix[:,0], label = 'port1')plt.plot((df.T+1).cumprod().ix[:,1], label = 'port2')plt.plot((df.T+1).cumprod().ix[:,2], label = 'port3')plt.plot((df.T+1).cumprod().ix[:,3], label = 'port4')plt.plot((df.T+1).cumprod().ix[:,4], label = 'port5')plt.plot((df.T+1).cumprod().ix[:,5], label = 'benchmark')plt.xlabel('return of factor %s'%fac)plt.legend(loc=0)for fac in effective_factors:draw_return_picture(monthly_return[:,:,fac])

冗余因子的剔除¶

有些因子,因为内在的逻辑比较相近等原因,选出来的组合在个股构成和收益等方面相关性较高。所以要对这些因子做冗余剔除,保留同类因子中收益最好、区分度最高的因子。 由于本人能力有限,未完成此步骤,这里仅列出具体方法:

(1)对不同因子的n个组合打分。收益越大分值越大。分值达到好将分值赋给每月该组合内的所有个股。

if AR1 > ARn #因子越小,收益越大

则组合i的分值为(n-i+1)

if AR1 < ARn #因子越小,收益越小

则组合i的分值为i

(2)按月计算个股不同因子得分的相关性矩阵。得到第t月个股的因子得分相关性矩阵Score_Corrt,u,v。u,v为因子序号。

(3)计算样本期内相关性矩阵的平均值。即样本期共m个月,加总矩阵后取1/m。

(4)设定得分相关性阈值MinScoreCorr。只保留与其他因子相关性较小的因子。

模型建立和选股¶

根据选好的有效因子,每月初对市场个股计算因子得分,按一定权重求得所有因子的平均分。如遇因子当月无取值时,按剩下的因子分值求加权平均。通过对个股的加权平均得分进行排序,选择排名靠前的股票交易。

以下代码段等权重对因子分值求和,选出分值最高的股票进行交易

def score_stock(fdate):#B/M, L/A, P/R, capitalization, circulating_cap, circulating_market_cap, market_cap, operating_revenue#八个因子越小收益越大,分值越大,应降序排;gross_profit_margin, inc_net_profit_year_on_year, #inc_operation_profit_year_on_year三个因子越大收益越大应顺序排effective_factors = {'inc_net_profit_year_on_year':True,'gross_profit_margin':True,'inc_operation_profit_year_on_year':True, 'B/M':False,'L/A':False,'P/R':False, 'capitalization':False, 'circulating_cap':False,'circulating_market_cap':False, 'market_cap':False, 'operating_revenue':False}fdf = get_factors(fdate)score = {}for fac,value in effective_factors.items():score[fac] = fdf[fac].rank(ascending = value,method = 'first')print DataFrame(score).T.sum().order(ascending = False).head(5)score_stock = list(DataFrame(score).T.sum().order(ascending = False).index)return score_stock,fdf['circulating_market_cap']def get_factors(fdate):factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
                     'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']stock_set = get_index_stocks('000001.XSHG',fdate)q = query(valuation.code,balance.total_owner_equities/valuation.market_cap/100000000,balance.total_liability/balance.total_assets,income.net_profit/income.operating_revenue,valuation.capitalization,valuation.circulating_cap,valuation.circulating_market_cap,indicator.gross_profit_margin,indicator.inc_net_profit_year_on_year,indicator.inc_operation_profit_year_on_year,valuation.market_cap,income.operating_revenue).filter(valuation.code.in_(stock_set))fdf = get_fundamentals(q,date = fdate)fdf.index = fdf['code']fdf.columns = ['code'] + factorsreturn fdf.iloc[:,-11:][score_result,circulating_market_cap] = score_stock('2017-01-01')
code
603859.XSHG    10603
603189.XSHG    10570
600817.XSHG    10501
600385.XSHG    10422
603518.XSHG    10375
dtype: float64

6个组合和benchmark在7年中的月收益率¶

计算port1-port5以及TOP20和benchmark的月收益率,时间跨度为7×12=84个月,并将所有数据储存在panel中。

year = ['2010','2011','2012','2013','2014','2015','2016']month = ['01','02','03','04','05','06','07','08','09','10','11','12']factors = ['B/M','L/A','P/R', 'capitalization', 'circulating_cap', 'circulating_market_cap', 'gross_profit_margin', 
          'inc_net_profit_year_on_year', 'inc_operation_profit_year_on_year', 'market_cap', 'operating_revenue']result = {}for i in range(7*12):startdate = year[i/12] + '-' + month[i%12] + '-01'try:enddate = year[(i+1)/12] + '-' + month[(i+1)%12] + '-01'except IndexError:enddate = '2017-01-01'try:nextdate = year[(i+2)/12] + '-' + month[(i+2)%12] + '-01'except IndexError:if enddate == '2017-01-01':nextdate = '2017-02-01'else:nextdate = '2017-01-01'print 'time %s'%startdate#综合11个因子打分后,划分几个组合df = DataFrame(np.zeros(7),index = ['Top20','port1','port2','port3','port4','port5','benchmark'])[score,circulating_market_cap] = score_stock(startdate)port0 = score[:20]port1 = score[: len(score)/5]port2 = score[ len(score)/5+1: 2*len(score)/5]port3 = score[ 2*len(score)/5+1: -2*len(score)/5]port4 = score[ -2*len(score)/5+1: -len(score)/5]port5 = score[ -len(score)/5+1: ]print len(score)
 df.ix['Top20'] = calculate_port_monthly_return(port0,startdate,enddate,nextdate,circulating_market_cap)df.ix['port1'] = calculate_port_monthly_return(port1,startdate,enddate,nextdate,circulating_market_cap)df.ix['port2'] = calculate_port_monthly_return(port2,startdate,enddate,nextdate,circulating_market_cap)df.ix['port3'] = calculate_port_monthly_return(port3,startdate,enddate,nextdate,circulating_market_cap)df.ix['port4'] = calculate_port_monthly_return(port4,startdate,enddate,nextdate,circulating_market_cap)df.ix['port5'] = calculate_port_monthly_return(port5,startdate,enddate,nextdate,circulating_market_cap)df.ix['benchmark'] = calculate_benchmark_monthly_return(startdate,enddate,nextdate)result[i+1]=df
time 2010-01-01
code
600634.XSHG    7970
600137.XSHG    7574
600513.XSHG    7483
600633.XSHG    7455
600766.XSHG    7390
dtype: float64
850
time 2010-02-01
code
600634.XSHG    7926
600137.XSHG    7567
600513.XSHG    7487
600633.XSHG    7447
600766.XSHG    7391
dtype: float64
849
time 2010-03-01
code
600634.XSHG    7926
600562.XSHG    7726
600137.XSHG    7562
600513.XSHG    7436
600633.XSHG    7420
dtype: float64
846
time 2010-04-01
code
600634.XSHG    7890
600562.XSHG    7641
600513.XSHG    7352
600633.XSHG    7334
600687.XSHG    7295
dtype: float64
848
time 2010-05-01
code
600613.XSHG    8192
600634.XSHG    8051
600506.XSHG    7872
600353.XSHG    7838
600629.XSHG    7619
dtype: float64
862
time 2010-06-01
code
600613.XSHG    8136
600634.XSHG    8026
600506.XSHG    7849
600353.XSHG    7818
600629.XSHG    7579
dtype: float64
859
time 2010-07-01
code
600613.XSHG    8172
600634.XSHG    8059
600506.XSHG    7870
600353.XSHG    7819
600647.XSHG    7589
dtype: float64
859
time 2010-08-01
code
600613.XSHG    8188
600634.XSHG    8069
600629.XSHG    7597
600647.XSHG    7523
600520.XSHG    7352
dtype: float64
859
time 2010-09-01
code
600671.XSHG    8126
600634.XSHG    8102
600365.XSHG    8087
600711.XSHG    8016
600733.XSHG    7846
dtype: float64
862
time 2010-10-01
code
600671.XSHG    8117
600634.XSHG    8074
600711.XSHG    8007
600365.XSHG    7990
600733.XSHG    7870
dtype: float64
862
time 2010-11-01
code
600671.XSHG    8292
600506.XSHG    8057
600365.XSHG    7985
600699.XSHG    7861
600634.XSHG    7843
dtype: float64
867
time 2010-12-01
code
600671.XSHG    8295
600506.XSHG    8087
600365.XSHG    7997
600634.XSHG    7885
600699.XSHG    7878
dtype: float64
867
time 2011-01-01
code
600671.XSHG    8285
600506.XSHG    8095
600365.XSHG    8075
600634.XSHG    7894
600647.XSHG    7877
dtype: float64
867
time 2011-02-01
code
600671.XSHG    8310
600365.XSHG    8094
600506.XSHG    8085
600634.XSHG    7904
600647.XSHG    7888
dtype: float64
867
time 2011-03-01
code
600671.XSHG    8302
600506.XSHG    8065
600365.XSHG    7985
600634.XSHG    7882
600647.XSHG    7877
dtype: float64
866
time 2011-04-01
code
600671.XSHG    8314
600365.XSHG    7973
600634.XSHG    7934
600617.XSHG    7881
600077.XSHG    7867
dtype: float64
874
time 2011-05-01
code
600671.XSHG    8545
600340.XSHG    8261
600365.XSHG    8235
600562.XSHG    8130
600613.XSHG    8117
dtype: float64
884
time 2011-06-01
code
600671.XSHG    8534
600365.XSHG    8251
600149.XSHG    8150
600613.XSHG    8134
600562.XSHG    8132
dtype: float64
884
time 2011-07-01
code
600671.XSHG    8546
600365.XSHG    8270
600149.XSHG    8170
600613.XSHG    8141
600562.XSHG    8126
dtype: float64
884
time 2011-08-01
code
600671.XSHG    8562
600149.XSHG    8156
600613.XSHG    8146
600562.XSHG    8104
600520.XSHG    7964
dtype: float64
885
time 2011-09-01
code
600634.XSHG    8447
600562.XSHG    8235
600671.XSHG    8096
600476.XSHG    8021
600706.XSHG    8007
dtype: float64
901
time 2011-10-01
code
600634.XSHG    8453
600562.XSHG    8150
600671.XSHG    8108
600476.XSHG    8072
600077.XSHG    7997
dtype: float64
902
time 2011-11-01
code
600671.XSHG    8726
600705.XSHG    8083
600421.XSHG    8065
600476.XSHG    8060
600576.XSHG    8037
dtype: float64
913
time 2011-12-01
code
600671.XSHG    8740
600576.XSHG    8111
600705.XSHG    8099
600476.XSHG    8073
600571.XSHG    8005
dtype: float64
913
time 2012-01-01
code
600671.XSHG    8721
600576.XSHG    8119
600705.XSHG    8109
600476.XSHG    8074
600421.XSHG    8019
dtype: float64
913
time 2012-02-01
code
600671.XSHG    8729
600136.XSHG    8225
600576.XSHG    8134
600705.XSHG    8121
600476.XSHG    8098
dtype: float64
913
time 2012-03-01
code
600671.XSHG    8737
600136.XSHG    8210
600576.XSHG    8119
600476.XSHG    8074
600571.XSHG    8029
dtype: float64
912
time 2012-04-01
code
600671.XSHG    8792
600365.XSHG    8294
600576.XSHG    8264
600136.XSHG    8241
600733.XSHG    8191
dtype: float64
915
time 2012-05-01
code
600671.XSHG    8842
600593.XSHG    8582
600562.XSHG    8509
600513.XSHG    8473
600576.XSHG    8427
dtype: float64
921
time 2012-06-01
code
600634.XSHG    8749
600593.XSHG    8657
600513.XSHG    8534
600562.XSHG    8520
600455.XSHG    8266
dtype: float64
922
time 2012-07-01
code
600634.XSHG    8744
600593.XSHG    8672
600562.XSHG    8532
600513.XSHG    8438
600571.XSHG    8279
dtype: float64
922
time 2012-08-01
code
600634.XSHG    8746
600593.XSHG    8671
600562.XSHG    8534
600513.XSHG    8447
600571.XSHG    8289
dtype: float64
922
time 2012-09-01
code
600136.XSHG    9301
600485.XSHG    8920
600733.XSHG    8878
600749.XSHG    8771
600520.XSHG    8519
dtype: float64
934
time 2012-10-01
code
600136.XSHG    9297
600485.XSHG    8921
600733.XSHG    8868
600749.XSHG    8778
600758.XSHG    8517
dtype: float64
934
time 2012-11-01
code
600634.XSHG    9529
600733.XSHG    8839
600365.XSHG    8697
600647.XSHG    8504
600758.XSHG    8503
dtype: float64
941
time 2012-12-01
code
600634.XSHG    9527
600733.XSHG    8887
600365.XSHG    8716
600647.XSHG    8551
600758.XSHG    8509
dtype: float64
941
time 2013-01-01
code
600634.XSHG    9527
600733.XSHG    8877
600365.XSHG    8712
600647.XSHG    8556
600758.XSHG    8513
dtype: float64
941
time 2013-02-01
code
600634.XSHG    9515
600733.XSHG    8851
600647.XSHG    8571
600758.XSHG    8523
600980.XSHG    8492
dtype: float64
941
time 2013-03-01
code
600634.XSHG    9521
600733.XSHG    8865
600647.XSHG    8584
600758.XSHG    8538
600599.XSHG    8532
dtype: float64
943
time 2013-04-01
code
600634.XSHG    9441
600613.XSHG    8662
600985.XSHG    8643
600599.XSHG    8529
600647.XSHG    8481
dtype: float64
943
time 2013-05-01
code
600634.XSHG    9496
600136.XSHG    8957
600980.XSHG    8775
600985.XSHG    8645
600599.XSHG    8590
dtype: float64
943
time 2013-06-01
code
600485.XSHG    9068
600136.XSHG    8937
600980.XSHG    8771
600576.XSHG    8389
600706.XSHG    8377
dtype: float64
942
time 2013-07-01
code
600485.XSHG    9078
600136.XSHG    8947
600980.XSHG    8757
600706.XSHG    8376
600576.XSHG    8362
dtype: float64
942
time 2013-08-01
code
600485.XSHG    9083
600980.XSHG    8750
600576.XSHG    8387
600706.XSHG    8358
600379.XSHG    8346
dtype: float64
942
time 2013-09-01
code
600365.XSHG    9042
600485.XSHG    8983
600980.XSHG    8876
600615.XSHG    8693
600593.XSHG    8588
dtype: float64
942
time 2013-10-01
code
600365.XSHG    9027
600485.XSHG    8966
600980.XSHG    8869
600615.XSHG    8698
600234.XSHG    8609
dtype: float64
942
time 2013-11-01
code
600817.XSHG    8738
600733.XSHG    8730
600485.XSHG    8497
600758.XSHG    8463
600099.XSHG    8444
dtype: float64
942
time 2013-12-01
code
600733.XSHG    8769
600817.XSHG    8761
600758.XSHG    8464
600520.XSHG    8447
600099.XSHG    8440
dtype: float64
942
time 2014-01-01
code
600817.XSHG    8763
600733.XSHG    8712
600485.XSHG    8461
600758.XSHG    8458
600520.XSHG    8445
dtype: float64
942
time 2014-02-01
code
600817.XSHG    8753
600733.XSHG    8746
600758.XSHG    8462
600146.XSHG    8453
600520.XSHG    8447
dtype: float64
942
time 2014-03-01
code
600817.XSHG    8767
600733.XSHG    8727
600485.XSHG    8499
600520.XSHG    8466
600758.XSHG    8464
dtype: float64
942
time 2014-04-01
code
600817.XSHG    8807
600146.XSHG    8465
600506.XSHG    8453
600781.XSHG    8453
600485.XSHG    8396
dtype: float64
945
time 2014-05-01
code
600539.XSHG    9184
600980.XSHG    9071
600593.XSHG    8886
600355.XSHG    8805
600485.XSHG    8775
dtype: float64
949
time 2014-06-01
code
600539.XSHG    9182
600980.XSHG    9080
600753.XSHG    8914
600593.XSHG    8895
600355.XSHG    8806
dtype: float64
949
time 2014-07-01
code
600539.XSHG    9157
600980.XSHG    9047
600753.XSHG    8940
600593.XSHG    8894
600355.XSHG    8769
dtype: float64
948
time 2014-08-01
code
600539.XSHG    9193
600980.XSHG    9025
600593.XSHG    8887
600576.XSHG    8884
600753.XSHG    8879
dtype: float64
948
time 2014-09-01
code
600365.XSHG    9017
600099.XSHG    8804
600355.XSHG    8790
600847.XSHG    8781
600539.XSHG    8716
dtype: float64
952
time 2014-10-01
code
600365.XSHG    9028
600355.XSHG    8846
600099.XSHG    8815
600847.XSHG    8812
600476.XSHG    8734
dtype: float64
952
time 2014-11-01
code
600817.XSHG    9149
600599.XSHG    9104
600696.XSHG    9013
600419.XSHG    8923
600136.XSHG    8904
dtype: float64
969
time 2014-12-01
code
600817.XSHG    9184
600696.XSHG    9027
600599.XSHG    8980
600419.XSHG    8928
600136.XSHG    8896
dtype: float64
970
time 2015-01-01
code
600817.XSHG    9183
600696.XSHG    9112
600599.XSHG    9069
600136.XSHG    8922
600419.XSHG    8913
dtype: float64
970
time 2015-02-01
code
600817.XSHG    9194
600696.XSHG    9094
600599.XSHG    9031
600419.XSHG    8920
600136.XSHG    8916
dtype: float64
970
time 2015-03-01
code
600817.XSHG    9199
600696.XSHG    9096
600599.XSHG    9039
600419.XSHG    8924
600539.XSHG    8809
dtype: float64
970
time 2015-04-01
code
600817.XSHG    9240
600696.XSHG    9173
600099.XSHG    8985
603601.XSHG    8980
600539.XSHG    8887
dtype: float64
983
time 2015-05-01
code
603869.XSHG    9624
603088.XSHG    9495
600455.XSHG    9385
603898.XSHG    9370
603988.XSHG    9369
dtype: float64
1021
time 2015-06-01
code
603869.XSHG    9620
603088.XSHG    9587
603988.XSHG    9458
600455.XSHG    9455
600365.XSHG    9433
dtype: float64
1031
time 2015-07-01
code
603869.XSHG    9801
603088.XSHG    9675
603988.XSHG    9560
600455.XSHG    9537
603636.XSHG    9509
dtype: float64
1040
time 2015-08-01
code
603869.XSHG    9744
603988.XSHG    9558
600365.XSHG    9400
603010.XSHG    9361
600136.XSHG    9348
dtype: float64
1042
time 2015-09-01
code
600506.XSHG    9878
603099.XSHG    9588
600520.XSHG    9543
600593.XSHG    9484
600136.XSHG    9441
dtype: float64
1061
time 2015-10-01
code
600506.XSHG    9877
603099.XSHG    9605
600520.XSHG    9583
600593.XSHG    9519
600365.XSHG    9432
dtype: float64
1061
time 2015-11-01
code
603918.XSHG    9683
600980.XSHG    9565
600599.XSHG    9465
603601.XSHG    9434
600371.XSHG    9419
dtype: float64
1061
time 2015-12-01
code
600980.XSHG    9569
603918.XSHG    9519
600753.XSHG    9517
603010.XSHG    9414
600599.XSHG    9368
dtype: float64
1061
time 2016-01-01
code
603918.XSHG    9684
600980.XSHG    9593
600753.XSHG    9547
600599.XSHG    9482
603601.XSHG    9434
dtype: float64
1067
time 2016-02-01
code
603918.XSHG    9767
600599.XSHG    9657
600980.XSHG    9581
603085.XSHG    9466
600419.XSHG    9450
dtype: float64
1072
time 2016-03-01
code
603918.XSHG    9785
600599.XSHG    9727
600980.XSHG    9618
603085.XSHG    9474
600371.XSHG    9464
dtype: float64
1074
time 2016-04-01
code
600599.XSHG    9951
600419.XSHG    9836
600080.XSHG    9744
603918.XSHG    9709
600211.XSHG    9479
dtype: float64
1079
time 2016-05-01
code
603601.XSHG    10027
603918.XSHG     9955
600137.XSHG     9880
600733.XSHG     9749
603023.XSHG     9722
dtype: float64
1081
time 2016-06-01
code
600137.XSHG    10007
600733.XSHG     9933
603601.XSHG     9811
600506.XSHG     9806
603023.XSHG     9771
dtype: float64
1089
time 2016-07-01
code
600137.XSHG    10080
600733.XSHG    10004
600506.XSHG     9912
603601.XSHG     9761
603066.XSHG     9747
dtype: float64
1097
time 2016-08-01
code
600137.XSHG    10094
603322.XSHG    10014
603601.XSHG     9937
600506.XSHG     9910
600733.XSHG     9848
dtype: float64
1101
time 2016-09-01
code
600455.XSHG    10204
600980.XSHG     9976
603088.XSHG     9933
603027.XSHG     9928
603838.XSHG     9895
dtype: float64
1115
time 2016-10-01
code
600455.XSHG    10228
600980.XSHG    10099
603027.XSHG    10025
603088.XSHG    10021
603779.XSHG    10014
dtype: float64
1124
time 2016-11-01
code
603859.XSHG    10654
600817.XSHG    10491
603779.XSHG    10452
603189.XSHG    10449
600385.XSHG    10437
dtype: float64
1131
time 2016-12-01
code
603859.XSHG    10648
600817.XSHG    10493
603189.XSHG    10459
603779.XSHG    10449
600385.XSHG    10441
dtype: float64
1131
df = pd.Panel(result)

绘制六个组合的月超额收益率¶

matplotlib.rcParams['axes.unicode_minus']=Falseindex = ['Top20','port1','port2','port3','port4','port5']def draw_backtest_picture(ind):plt.figure(figsize =(10,4))plt.plot(df.ix[:,ind,0]-df.ix[:,'benchmark',0], label = 'excess return: %s'%ind)plt.xlabel('backtest excess return of factor %s'%ind)plt.legend(loc=0)grid()for ind in index:draw_backtest_picture(ind)

后记¶

本文在《【研究】量化选股-因子检验和多因子模型的构建》基础上添加了一些因子,同时将时间滞后。原文章见 . 如果阅读过原文章的读者,可以直接跳过代码看运行的结果。

本文在最初因子的选取上,比原作者增加了13个因子,在进行几乎同样的筛选之后,发现收益率有了一些提高。当然,选取的因子还是局限于价值类,成长类和规模类(交投类因子筛选的时候被去掉了QAQ)。由于JoinQuant的API文档逐渐完善,更多的参数可以直接调用。有兴趣的小伙伴可以尝试一下从以下方面筛选因子:

  1. 情绪类因子。包括公司评级及评级变化,未来预期收益等,可以从JoinQuant的“国泰安数据—>分析师评级预测”中提取 

  2. 动量类因子。不同时间区间的涨跌幅,JoinQuant中的指令为change_pct.

  3. 其他技术分析因子。如BRAR-情绪指标(也可以归为情绪类),RSI-相对强弱指标,MTM-动量线(也可以归为动量类),MACD-平滑异同平均,VMACD-量平滑移动平均。这些技术分析指标可以在JoinQuant中的技术分析指标中获得: . 另外,Alpha101因子也可以进行尝试 

全部回复

0/140

量化课程

    移动端课程