请 [注册] 或 [登录]  | 返回主站

量化交易吧 /  数理科学 帖子:3353359 新帖:5

【量化课堂】高送转预测 逻辑回归与支持向量机

我太难了发表于:5 月 10 日 07:10回复(1)

前言

亲爱的朋友们大家好,又到了年末时节,每年这个时候到下一年3、4月份都是A股各上市公司的业绩预报、分配预报、年报正式报告的发布期,有意进行股票高送转的上市公司也会在这个时间段内发布高送转预案。众所周知,年报高送转题材的炒作是每年A股市场的开年大菜、必点精品,不少主题炒作投资者对此尤为热衷,并且随着A股市场的扩容,每年发布高送转预案的公司数量逐年增加。因此,为了更好地把握年报送转行情,本文将运用逻辑回归与SVM搭建高送转股票的预测模型,并对2017年年报高送转股票进行预测。


导语

影响上市公司实施高送转的因素有很多,包括市场环境、财务状态、股票价格和监管层政策等。不少学者发现,高积累、高股价和股本小是上市公司实施”高送转“的先决条件,次新股、股本扩张与业绩一致增长的股票高送转的意愿比较强。这里,我们将影响高送转的因子分为以下几类:

1512978771(1).png


虽然每年中报发布时候,也有一些股票发布高送转方案,但是总的来说,中报高送转的数量要远远小于年报高送转的股票。因此,这里我们只研究年报高送转的数据。

数据集

1.因变量:

高送转(0-1变量),在这里,我们将送股比例 转股比例大于1的股票定义为高送转股票,即每十股送红股 转增股的加总大于10股,实施高送转的股票该变量为1,否则为0.

2.自变量:

初步选取每股资本公积 留存收益,每股总资产,总股本,每股盈余,营业收入同比增速,前20个交易日平均价,上市天数作为自变量,其中:每股资本公积 留存收益,每股总资产,总股本,每股盈余,营业收入同比增速的数据来源为每年的三季报,前20个交易日平均价,上市天数以每年11月1日为基准进行计算。
这里简单解释一下为什么不用定增指标,因为定增的股票基本不会是次新股,而在次新指标将结果往1拉的时候,这里的定增为0又会把结果往反方向拉回,特别是在本次使用数据集中送转与不送转、定增与不定增、次新与否的样本量高度不平衡的情况下,影响较大。当然我们也做了相应的检测,发现在加入了定增指标后预测效果确实不如之前。不过,定增与否的指标可以在选出可能送转股票后进一步筛选时使用。

3.特征选择:

(1).基于树的特征选择
1512981244(1).png


可以看到,各个变量对因变量的重要性都相差不大,但是从相关性上来看,每股资本公积 未分配利润与每股净资产相关性非常高,这也很容易理解,因为净资产=股东权益=股本 资本公积 盈余公积 未分配利润。因此,这里我们将每股净资产这个变量删除,再做基于递归特征消除的特征判断。


(2).基于RFE的特征选择
1512981588(1).png


RFE可以衡量随着特征数目的增加,模型整体分类准确率的变化,以此来确定最优的特征数目。直到自变量数增加到6,模型的交叉检验准确率一直在上升,因此,我们保留剩下的6个因变量,作为我们预测高送转的使用变量。

基于逻辑回归的预测

训练集:
假设当前时间为t年,则以t-1与t-2两年的数据作为训练集
测试集:
以t年数据作为测试集
例如:以2013年、2014年的股票指标以及是否高送转训练模型,将得到的参数用于2015年的测试集,得出15年测试集送转概率。

评估方法:
以训练集数据拟合模型得到参数后带入测试集数据,将测试集数据按股票送转的概率排序,分别取概率最大的前10、25、50
100、200只股票,计算这些股票中真实发生高送转的概率。
1512990088(1).png
表中纵坐标下,scale ,minmax 表示两种不同的标准化方法,scale为均值方差标准化,minmax为0-1标准化(即将数据的规模压缩到0-1之间),no表示不做标准化。
accuracy_scale表示在均值方差标准化方法下,模型的预测准确率,accuracy_minmax表示在0-1标准化方法下模型的预测准确率。横坐标表示所选的送转概率最大的前10只,前25只,前50只等等。以第四个图为例,第一排第一列表示在预测送转概率最大的前十只股票中,有8(=0.8*10)只确实进行了高送转。
在逻辑回归中,模型的准确性对于两个不同的标准化方法表现稳健,但标准化下的效果要好于不标准化。从准确度上来看,2013年预测准确性较差,之后预测准确度逐渐变高,特别是16年,模型的前25只、前50只预测准确度都高达84%

基于SVM的预测

1512994780.png

基于SVM模型的预测结果波动性较大,而且易受标准化方法的影响,13、14年scale标准化效果更优,而在15、16年minmax标准化效果却好于前者。与逻辑回归模型相比,SVM模型在13、14年效果优于逻辑回归,但在15年,scale标准化下弱于逻辑回归,minmax标准化下优于逻辑回归,16年各种标准化方法下都相比较差。但是明显看出,模型在13年的预测准确度要大幅低于后面几年,此结果同逻辑回归模型一致。
SVM模型时而更优时而更差,要在这两个模型中选择其一的话不好抉择,因此不如将两个模型联合起来,整合预测。

基于逻辑回归&SVM的预测

方法(排序打分):
1.分别以逻辑回归与SVM进行高送转概率预测,按概率从大到小进行排序,基于排序给于相应分数,第一为1分,第二位2分。
2.将两个模型得分相加,按得分从小到大排序

1512985786(1).png

将两个模型整合之后,预测准确性更加稳定,相对于SVM模型,受标准化方法的影响也更小。在对于2017年的预测中,我们使用联合的模型,使用scale标准化方法。

2017年预测结果

1512996514(1).png

总结

通过上述模型,我们:
1.比较了两种模型预测高送转的优劣
2.整合两个模型为更稳定模型
3.较为有效的预测第二年年报高送转股票。
本篇提供了一种预测高送转股票的思路,对于进行高送转题材投资有一定参考作用,我们也将继续研究对于预测所得高送转大概率股票的策略研究,后续将继续更新。

注:以上历年送转数据均来自外部数据库,如需要可以通过链接下载

本文由JoinQuant量化课堂推出,版权归JoinQuant所有,商业转载请联系我们获得授权,非商业转载请注明出处。
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport datetimeimport matplotlib.pyplot as pltfrom sklearn.svm import SVCfrom sklearn.model_selection import StratifiedKFoldfrom sklearn.feature_selection import RFECVfrom sklearn.linear_model import LogisticRegressionfrom sklearn import preprocessing
/opt/conda/lib/python3.4/site-packages/sklearn/externals/joblib/_multiprocessing_helpers.py:28: UserWarning: [Errno 30] Read-only file system.  joblib will operate in serial mode
  warnings.warn('%s.  joblib will operate in serial mode' % (e,))
#####获取年报高送转datadiv_data = pd.read_csv(r'Div_data.csv',index_col=0)##只留年报数据div_data['type'] = div_data['endDate'].map(lambda x:x[-5:])div_data['year'] = div_data['endDate'].map(lambda x:x[0:4])div_data_year = div_data[div_data['type'] == '12-31']div_data_year = div_data_year[['secID','year','publishDate', 'recordDate','perShareDivRatio',   'perShareTransRatio']]div_data_year.columns = ['stock','year','pub_date','execu_date','sg_ratio','zg_ratio']div_data_year.fillna(0,inplace = True)div_data_year['sz_ratio'] = div_data_year['sg_ratio']+div_data_year['zg_ratio']div_data_year['gsz'] = 0 div_data_year.ix[div_data_year['sz_ratio'] >=1,'gsz'] = 1
####获取q1q2q3已经高送转dataq123_already_szdata = pd.read_csv(r'q1q2q3_already_sz_stock.csv',index_col=0)
###获取因变量因子"""每股资本公积、每股未分配利润、每股净资产、每股收益营业收入同比增速、股本数量、股票价格、是否为次新股、上市日天数"""###将一些指标转变为每股数值def get_perstock_indicator(need_indicator,old_name,new_name,sdate):target = get_fundamentals(query(valuation.code,valuation.capitalization,need_indicator),statDate = sdate)target[new_name] = target[old_name]/target['capitalization']/10000
    return target[['code',new_name]]###获取每股收益、股本数量def get_other_indicator(sdate):target = get_fundamentals(query(valuation.code,valuation.capitalization,\indicator.inc_revenue_year_on_year,\indicator.eps),statDate = sdate)target.rename(columns={'inc_revenue_year_on_year':'revenue_growth'},inplace = True)target['capitalization'] = target['capitalization']*10000return target[['code','capitalization','eps','revenue_growth']]###获取一个月收盘价平均值def get_bmonth_aprice(code_list,startdate,enddate):mid_data = get_price(code_list, start_date=startdate, end_date=enddate,\              frequency='daily', fields='close', skip_paused=False, fq='pre')mean_price = pd.DataFrame(mid_data['close'].mean(aixs = 0),columns=['mean_price'])mean_price['code'] =mean_price.indexmean_price.reset_index(drop = True,inplace =True)return mean_price[['code','mean_price']]###判断是否为次新股(判断标准为位于上市一年之内)                          def judge_cxstock(date):mid_data = get_all_securities(types=['stock'])mid_data['start_date'] = mid_data['start_date'].map(lambda x:x.strftime("%Y-%m-%d"))shift_date = str(int(date[0:4])-1)+date[4:]mid_data['1year_shift_date'] = shift_datemid_data['cx_stock'] = 0mid_data.ix[mid_data['1year_shift_date']<=mid_data['start_date'],'cx_stock'] = 1mid_data['code'] = mid_data.indexmid_data.reset_index(drop = True,inplace=True)return mid_data[['code','cx_stock']]###判断是否增发了股票(相比于一年前)def judge_dz(sdate1,sdate2):target1 = get_fundamentals(query(valuation.code,valuation.capitalization,balance.capital_reserve_fund),statDate = sdate1)target1['CRF_1'] = target1[ 'capital_reserve_fund']/target1['capitalization']/10000target2 = get_fundamentals(query(valuation.code,valuation.capitalization,balance.capital_reserve_fund),statDate = sdate2)target2['CRF_2'] = target2[ 'capital_reserve_fund']/target2['capitalization']/10000target = target1[['code','CRF_1']].merge(target2[['code','CRF_2']],on=['code'],how='outer')target['CRF_change'] = target['CRF_1'] - target['CRF_2']target['dz'] = 0target.ix[target['CRF_change']>1,'dz']=1target.fillna(0,inplace = True)return target[['code','dz']]###判断上市了多少个自然日def get_dayslisted(year,month,day):mid_data = get_all_securities(types=['stock'])date = datetime.date(year,month,day)mid_data['days_listed'] = mid_data['start_date'].map(lambda x:(date -x).days)mid_data['code'] = mid_data.indexmid_data.reset_index(drop = True,inplace=True)return mid_data[['code','days_listed']]def get_yearly_totaldata(statDate,statDate_before,mp_startdate,mp_enddate,year,month,day):"""    输入:所需财务报表期、20日平均股价开始日期、20日平均股价结束日期    输出:合并好的高送转数据 以及 财务指标数据    """per_zbgj = get_perstock_indicator(balance.capital_reserve_fund,'capital_reserve_fund','per_CapitalReserveFund',statDate)per_wflr = get_perstock_indicator(balance.retained_profit,'retained_profit','per_RetainProfit',statDate)per_jzc = get_perstock_indicator(balance.equities_parent_company_owners,'equities_parent_company_owners','per_TotalOwnerEquity',statDate) 
    other_indicator = get_other_indicator(statDate)code_list = other_indicator['code'].tolist()mean_price = get_bmonth_aprice(code_list,mp_startdate,mp_enddate)cx_signal = judge_cxstock(mp_enddate)dz_signal = judge_dz(statDate,statDate_before)days_listed = get_dayslisted(year,month,day)chart_list = [per_zbgj,per_wflr,per_jzc,other_indicator,mean_price,cx_signal,dz_signal,days_listed]for chart in chart_list:chart.set_index('code',inplace = True)independ_vari = pd.concat([per_zbgj,per_wflr,per_jzc,other_indicator,mean_price,cx_signal,dz_signal,days_listed],axis = 1)independ_vari['year'] = str(int(statDate[0:4]))independ_vari['stock'] = independ_vari.indexindepend_vari.reset_index(drop=True,inplace =True)total_data = pd.merge(div_data_year,independ_vari,on = ['stock','year'],how = 'inner')total_data['per_zbgj_wflr'] = total_data['per_CapitalReserveFund']+total_data['per_RetainProfit']
    return total_data
gsz_2016 = get_yearly_totaldata('2016q3','2015q3','2016-10-01','2016-11-01',2016,11,1)gsz_2015 = get_yearly_totaldata('2015q3','2014q3','2015-10-01','2015-11-01',2015,11,1)gsz_2014 = get_yearly_totaldata('2014q3','2013q3','2014-10-01','2014-11-01',2014,11,1)gsz_2013 = get_yearly_totaldata('2013q3','2012q3','2013-10-01','2013-11-01',2013,11,1)gsz_2012 = get_yearly_totaldata('2012q3','2011q3','2012-10-01','2012-11-01',2012,11,1)gsz_2011 = get_yearly_totaldata('2011q3','2010q3','2011-10-01','2011-11-01',2011,11,1)###不希望过大的营业收入增长影响结果,实际上营收增长2000%和增长300%对是否送转结果影响差别不大for data in [gsz_2011,gsz_2012,gsz_2013,gsz_2014,gsz_2015,gsz_2016]:data.ix[data['revenue_growth']>300,'revenue_growth'] =300

特征选择¶

在每股资本公积+留存收益,每股总资产,总股本,每股盈余,营业利润同比增速,前20个交易日平均价,上市天数 这些变量中进行选择

###基于树的判断traindata = pd.concat([gsz_2011,gsz_2012,gsz_2013,gsz_2014,gsz_2015,gsz_2016],axis = 0)traindata.dropna(inplace = True)x_traindata = traindata[['per_zbgj_wflr',\       'per_TotalOwnerEquity', 'capitalization', 'eps', 'revenue_growth',\       'mean_price', 'days_listed']]y_traindata = traindata[['gsz']]X_trainScale = preprocessing.scale(x_traindata)from sklearn.ensemble import ExtraTreesClassifiermodel = ExtraTreesClassifier() model.fit(X_trainScale,y_traindata)print(pd.DataFrame(model.feature_importances_.tolist(),index =['per_zbgj_wflr',\       'per_TotalOwnerEquity', 'capitalization', 'eps', 'revenue_growth',\       'mean_price', 'days_listed'],columns = ['importance'] ))
                      importance
per_zbgj_wflr           0.152848
per_TotalOwnerEquity    0.147104
capitalization          0.132174
eps                     0.132444
revenue_growth          0.133320
mean_price              0.133576
days_listed             0.168533
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
/opt/conda/lib/python3.4/site-packages/ipykernel/__main__.py:13: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using r*el().

各变量相关性

x_traindata.corr()

per_zbgj_wflrper_TotalOwnerEquitycapitalizationepsrevenue_growthmean_pricedays_listed
per_zbgj_wflr1.0000000.988297-0.0326200.4625430.0119740.319627-0.184551
per_TotalOwnerEquity0.9882971.000000-0.0094000.478124-0.0004330.323599-0.132749
capitalization-0.032620-0.0094001.0000000.028189-0.018424-0.0566600.005137
eps0.4625430.4781240.0281891.0000000.0755220.282604-0.018372
revenue_growth0.011974-0.000433-0.0184240.0755221.0000000.038924-0.051725
mean_price0.3196270.323599-0.0566600.2826040.0389241.000000-0.055522
days_listed-0.184551-0.1327490.005137-0.018372-0.051725-0.0555221.000000

可以看到每股净资产与每股资本公积+未分配利润 相关度非常高,因此舍去每股净资产

###基于RFE(递归特征消除) 判断traindata = pd.concat([gsz_2011,gsz_2012,gsz_2013,gsz_2014,gsz_2015,gsz_2016],axis = 0)traindata.dropna(inplace = True)x_traindata = traindata[['per_zbgj_wflr',\'capitalization', 'eps', 'revenue_growth',\       'mean_price', 'days_listed']]y_traindata = traindata[['gsz']]X_trainScale = preprocessing.scale(x_traindata)svc = SVC(C=1.0,class_weight='balanced',kernel='linear',probability=True)rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(2),  scoring='accuracy')X_trainScale = preprocessing.scale(x_traindata)rfecv.fit(X_trainScale,y_traindata)plt.figure()plt.xlabel("Number of features selected")plt.ylabel("Cross validation score (nb of correct classifications)")plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)plt.show()
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
/opt/conda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using r*el().
  y = column_or_1d(y, warn=True)

逻辑回归预测¶

def get_prediction(x_traindata,y_traindata,x_testdata,standard='scale'):if standard == 'scale':#均值方差标准化X_trainScale = preprocessing.scale(x_traindata)scaler = preprocessing.StandardScaler().fit(x_traindata)X_testScale = scaler.transform(x_testdata) elif standard =='minmax':#min_max标准化min_max_scaler = preprocessing.MinMaxScaler()X_trainScale = min_max_scaler.fit_transform(x_traindata)X_testScale = min_max_scaler.transform(x_testdata)elif standard =='no':#不标准化X_trainScale = x_traindataX_testScale = x_testdata###考虑到样本中高送转股票与非高送转股票样本的不平衡问题,这里选用调整的class_weightmodel = LogisticRegression(class_weight='balanced',C=1e9)model.fit(X_trainScale, y_traindata)predict_y = model.predict_proba(X_testScale)return predict_ydef assess_classification_result(traindata,testdata,variable_list,q123_sz_data,date1,date2,function = get_prediction):
  traindata.dropna(inplace = True)testdata.dropna(inplace = True)x_traindata = traindata.loc[:,variable_list]y_traindata = traindata.loc[:,'gsz']x_testdata = testdata.loc[:,variable_list]y_testdata = testdata.loc[:,'gsz']total = testdata.loc[:,['stock','gsz']]for method in ['scale','minmax','no']:predict_y = function(x_traindata,y_traindata,x_testdata,standard=method)total['predict_prob_'+method] = predict_y[:,1]###过滤今年前期已经送转过的股票q123_stock = q123_sz_data['stock'].tolist()total_filter = total.loc[total['stock'].isin(q123_stock)==False]###过滤ST股票stock_list = total_filter['stock'].tolist()st_data = pd.DataFrame(get_extras('is_st',stock_list , start_date=date1, end_date=date2, df=True).iloc[-1,:])st_data.columns =['st_signal']st_list = st_data[st_data['st_signal']==True]total_filter = total_filter[total_filter['stock'].isin(st_list)==False]###衡量不同选股个数、不同标准化方法下的 预测准度result_dict ={}for stock_num in [10,25,50,100,200]:accuracy_list = []for column in total_filter.columns[2:]:total_filter.sort(column,inplace = True,ascending = False)dd = total_filter[:stock_num]accuracy = len(dd[dd['gsz']==1])/len(dd)accuracy_list.append(accuracy)result_dict[stock_num] = accuracy_listresult = pd.DataFrame(result_dict,index =['accuracy_scale','accuracy_minmax','accuracy_no'])return result,total_filter
### 2013年预测结果traindata = pd.concat([gsz_2011,gsz_2012],axis=0)testdata = gsz_2013.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2013)&(q123_already_szdata['gs']>0)]result_2013,total_2013 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,'2013-10-25','2013-11-01')print (result_2013)
                 10    25    50    100    200
accuracy_scale   0.3  0.20  0.24  0.22  0.260
accuracy_minmax  0.3  0.20  0.24  0.22  0.260
accuracy_no      0.1  0.16  0.32  0.32  0.275
###2014年预测结果traindata = pd.concat([gsz_2012,gsz_2013],axis=0)testdata = gsz_2014.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2014)&(q123_already_szdata['gs']>0)]result_2014,total_2014 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,'2014-10-25','2014-11-01')print (result_2014)
                 10    25    50    100    200
accuracy_scale   0.6  0.60  0.58  0.55  0.465
accuracy_minmax  0.6  0.60  0.58  0.55  0.465
accuracy_no      0.7  0.48  0.46  0.47  0.375
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
####2015年预测结果traindata = pd.concat([gsz_2013,gsz_2014],axis=0)testdata = gsz_2015.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2015)&(q123_already_szdata['gs']>0)]result_2015,total_2015=  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,'2015-10-25','2015-11-01')print (result_2015)
                 10    25    50    100    200
accuracy_scale   0.9  0.68  0.64  0.65  0.595
accuracy_minmax  0.9  0.68  0.64  0.65  0.595
accuracy_no      0.9  0.64  0.66  0.53  0.445
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
####2016年预测结果traindata = pd.concat([gsz_2014,gsz_2015],axis=0)testdata = gsz_2016.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2016)&(q123_already_szdata['gs']>0)]result_2016,total_2016 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,'2016-10-25','2017-11-01')print (result_2016)
                 10    25    50    100    200
accuracy_scale   0.8  0.84  0.84  0.62  0.460
accuracy_minmax  0.8  0.84  0.84  0.62  0.460
accuracy_no      0.4  0.20  0.20  0.28  0.245
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "

SVM分类¶

from sklearn.svm import SVCdef get_prediction_SVM(x_traindata,y_traindata,x_testdata,standard='scale'):if standard == 'scale':#均值方差标准化standard_scaler = preprocessing.StandardScaler()X_trainScale = standard_scaler.fit_transform(x_traindata)X_testScale = standard_scaler.transform(x_testdata) elif standard =='minmax':#min_max标准化min_max_scaler = preprocessing.MinMaxScaler()X_trainScale = min_max_scaler.fit_transform(x_traindata)X_testScale = min_max_scaler.transform(x_testdata)elif standard =='no':#不标准化X_trainScale = x_traindataX_testScale = x_testdata###考虑到样本中高送转股票与非高送转股票样本的不平衡问题,这里选用调整的class_weightclf = SVC(C=1.0,class_weight='balanced',gamma='auto',kernel='rbf',probability=True)clf.fit(X_trainScale, y_traindata) predict_y=clf.predict_proba(X_testScale)return predict_y
### 2013年预测结果traindata = pd.concat([gsz_2011,gsz_2012],axis=0)testdata = gsz_2013.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2013)&(q123_already_szdata['gs']>0)]result_2013,total_2013 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,'2013-10-25',\'2013-11-01',function = get_prediction_SVM)print (result_2013)
                 10    25    50    100    200
accuracy_scale   0.6  0.36  0.30  0.30  0.265
accuracy_minmax  0.3  0.20  0.14  0.22  0.230
accuracy_no      0.0  0.00  0.12  0.07  0.055
###2014年预测结果traindata = pd.concat([gsz_2012,gsz_2013],axis=0)testdata = gsz_2014.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2014)&(q123_already_szdata['gs']>0)]result_2014,total_2014 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,\'2014-10-25','2014-11-01',function = get_prediction_SVM)print (result_2014)
                 10    25    50    100    200
accuracy_scale   0.7  0.64  0.66  0.51  0.475
accuracy_minmax  0.8  0.56  0.56  0.52  0.470
accuracy_no      0.2  0.12  0.08  0.11  0.065
####2015年预测结果traindata = pd.concat([gsz_2013,gsz_2014],axis=0)testdata = gsz_2015.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2015)&(q123_already_szdata['gs']>0)]result_2015,total_2015 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,\'2015-10-25','2015-11-01',function = get_prediction_SVM)print (result_2015)
                 10    25    50    100    200
accuracy_scale   0.5  0.68  0.62  0.56  0.545
accuracy_minmax  0.8  0.80  0.66  0.59  0.540
accuracy_no      0.2  0.12  0.14  0.12  0.130
####2016年预测结果traindata = pd.concat([gsz_2014,gsz_2015],axis=0)testdata = gsz_2016.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2016)&(q123_already_szdata['gs']>0)]result_2016,total_2016 =  assess_classification_result(traindata,testdata,variable_list,q123_sz_data,\'2016-10-25','2017-11-01',function = get_prediction_SVM)print (result_2016)
                 10    25    50    100    200
accuracy_scale   0.7  0.60  0.66  0.54  0.410
accuracy_minmax  0.7  0.76  0.74  0.55  0.395
accuracy_no      0.1  0.08  0.08  0.05  0.040

逻辑回归与SVM联合选择¶

def assess_unite_logit_SVM(traindata,testdata,variable_list,q123_sz_data,method_use,date1,date2):###Logit 部分traindata.dropna(inplace = True)testdata.dropna(inplace = True)x_traindata = traindata[variable_list]y_traindata = traindata[['gsz']]x_testdata = testdata[variable_list]y_testdata = testdata[['gsz']]total_logit = testdata[['stock','gsz']].copy()for method in ['scale','minmax','no']:predict_y = get_prediction(x_traindata,y_traindata,x_testdata,standard=method)total_logit['predict_prob_'+method] = predict_y[:,1]
    ###########SVM部分traindata.ix[traindata['gsz']==0,'gsz']=-1testdata.ix[testdata['gsz']==0,'gsz']=-1x_traindata = traindata[variable_list]y_traindata = traindata[['gsz']]x_testdata = testdata[variable_list]y_testdata = testdata[['gsz']]total_SVM = testdata[['stock','gsz']].copy()for method in ['scale','minmax','no']:predict_y = get_prediction_SVM(x_traindata,y_traindata,x_testdata,standard=method)total_SVM['predict_prob_'+method] = predict_y[:,1] ###合并columns = ['stock','gsz','predict_prob_scale','predict_prob_minmax','predict_prob_no']total = total_logit[columns].merge(total_SVM[['stock','predict_prob_scale','predict_prob_minmax',\                                                  'predict_prob_no']],on=['stock'])for method in ['scale','minmax','no']:total['score_logit'] = total['predict_prob_'+method+'_x'].rank(ascending = False)total['score_SVM'] = total['predict_prob_'+method+'_y'].rank(ascending = False)total['score_' + method] = total['score_logit']+total['score_SVM']###过滤今年前期已经送转过的股票q123_stock = q123_sz_data['stock'].tolist()total_filter = total.loc[total['stock'].isin(q123_stock)==False]###过滤ST股票stock_list = total_filter['stock'].tolist()st_data = pd.DataFrame(get_extras('is_st',stock_list , start_date=date1, end_date=date2, df=True).iloc[-1,:])st_data.columns =['st_signal']st_list = st_data[st_data['st_signal']==True]total_filter = total_filter[total_filter['stock'].isin(st_list)==False]result_dict ={}for stock_num in [10,25,50,100,200]:accuracy_list = []for column in ['score_scale','score_minmax','score_no']:total_filter.sort(column,inplace = True,ascending = True)dd = total_filter[:stock_num]accuracy = len(dd[dd['gsz']==1])/len(dd)accuracy_list.append(accuracy)result_dict[stock_num] = accuracy_listresult = pd.DataFrame(result_dict,index =['score_scale','score_minmax','score_no']) return result
###2013年traindata = pd.concat([gsz_2011,gsz_2012],axis=0)testdata = gsz_2013.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2013)&(q123_already_szdata['gs']>0)]result_2013_unite =  assess_unite_logit_SVM(traindata,testdata,variable_list,q123_sz_data,'minmax',\                                           '2013-10-25','2013-11-01')print (result_2013_unite)
              10    25    50    100    200
score_scale   0.4  0.36  0.32  0.28  0.275
score_minmax  0.4  0.24  0.18  0.23  0.250
score_no      0.1  0.08  0.24  0.32  0.270
/opt/conda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using r*el().
  y = column_or_1d(y, warn=True)
###2014年预测结果traindata = pd.concat([gsz_2012,gsz_2013],axis=0)testdata = gsz_2014.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2014)&(q123_already_szdata['gs']>0)]result_2014_unite =  assess_unite_logit_SVM(traindata,testdata,variable_list,q123_sz_data,'minmax',\                                           '2014-10-25','2014-11-01')print (result_2014_unite)
              10    25    50    100    200
score_scale   0.8  0.72  0.58  0.53  0.485
score_minmax  0.7  0.72  0.58  0.55  0.500
score_no      0.5  0.52  0.46  0.46  0.380
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
/opt/conda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using r*el().
  y = column_or_1d(y, warn=True)
####2015年预测结果traindata = pd.concat([gsz_2013,gsz_2014],axis=0)testdata = gsz_2015.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price', 'days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2015)&(q123_already_szdata['gs']>0)]result_2015_unite =   assess_unite_logit_SVM(traindata,testdata,variable_list,q123_sz_data,'minmax',\'2015-10-25','2015-11-01')print (result_2015_unite)
              10    25    50    100    200
score_scale   0.8  0.84  0.72  0.64  0.560
score_minmax  0.7  0.80  0.68  0.62  0.585
score_no      0.9  0.60  0.64  0.53  0.450
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
/opt/conda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using r*el().
  y = column_or_1d(y, warn=True)
####2016年预测结果traindata = pd.concat([gsz_2014,gsz_2015],axis=0)testdata = gsz_2016.copy()variable_list = ['per_zbgj_wflr','capitalization', 'eps', 'revenue_growth',\                   'mean_price','days_listed']q123_sz_data =  q123_already_szdata[(q123_already_szdata['year']==2016)&(q123_already_szdata['gs']>0)]result_2016_unite =  assess_unite_logit_SVM(traindata,testdata,variable_list,q123_sz_data,'minmax',\                                           '2016-10-25','2016-11-01')print (result_2016_unite)
              10    25   50    100   200
score_scale   0.9  0.88  0.8  0.63  0.45
score_minmax  0.9  0.96  0.8  0.63  0.40
score_no      0.4  0.20  0.2  0.27  0.24
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
/opt/conda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using r*el().
  y = column_or_1d(y, warn=True)

2017年预测¶

###取出2017年数据statDate = '2017q3'mp_startdate = '2017-10-01' mp_enddate = '2017-11-01'year = 2017 month = 11 day = 1per_zbgj = get_perstock_indicator(balance.capital_reserve_fund,'capital_reserve_fund','per_CapitalReserveFund',statDate)per_wflr = get_perstock_indicator(balance.retained_profit,'retained_profit','per_RetainedProfit',statDate)per_jzc = get_perstock_indicator(balance.total_owner_equities,'total_owner_equities','per_TotalOwnerEquity',statDate)other_indicator = get_other_indicator(statDate)code_list = other_indicator['code'].tolist()mean_price = get_bmonth_aprice(code_list,mp_startdate,mp_enddate)cx_signal = judge_cxstock(mp_enddate)days_listed = get_dayslisted(year,month,day)chart_list = [per_zbgj,per_wflr,per_jzc,other_indicator,mean_price,cx_signal,days_listed]for chart in chart_list:chart.set_index('code',inplace = True)independ_vari = pd.concat([per_zbgj,per_wflr,per_jzc,other_indicator,mean_price,cx_signal,days_listed],axis = 1)independ_vari['year'] = str(int(statDate[0:4]))independ_vari['stock'] = independ_vari.indexindepend_vari.reset_index(drop=True,inplace =True)independ_vari['per_zbgj_wflr'] = independ_vari['per_CapitalReserveFund']+independ_vari['per_RetainedProfit']gsz_2017 = independ_varigsz_2017.ix[gsz_2017['revenue_growth']>300,'revenue_growth'] = 300traindata = pd.concat([gsz_2015,gsz_2016],axis=0)testdata = gsz_2017q123_sz_data = q123_already_szdata[(q123_already_szdata['year']==2017)&(q123_already_szdata['gs']>0)]###Logit 部分traindata.dropna(inplace = True)testdata.dropna(inplace = True)x_traindata = traindata[variable_list]y_traindata = traindata[['gsz']]x_testdata = testdata[variable_list]total_logit = testdata[['stock']].copy()method='scale'predict_y = get_prediction(x_traindata,y_traindata,x_testdata,standard=method)total_logit['predict_prob_'+method] = predict_y[:,1]
    ###########SVM部分traindata.ix[traindata['gsz']==0,'gsz']=-1x_traindata = traindata[variable_list]y_traindata = traindata[['gsz']]x_testdata = testdata[variable_list]total_SVM = testdata[['stock']].copy()method = 'scale'predict_y = get_prediction_SVM(x_traindata,y_traindata,x_testdata,standard=method)total_SVM['predict_prob_'+method] = predict_y[:,1] ###合并columns = ['stock','predict_prob_'+method]total = total_logit[columns].merge(total_SVM[['stock','predict_prob_'+method]],on=['stock']) total['score_logit'] = total['predict_prob_'+method+'_x'].rank(ascending = False)total['score_SVM'] = total['predict_prob_'+method+'_y'].rank(ascending = False)total['score'] = total['score_logit']+total['score_SVM']###过滤今年前期已经送转过的股票q123_stock = q123_sz_data['stock'].tolist()total_filter = total.loc[total['stock'].isin(q123_stock)==False]###过滤ST股票stock_list = total_filter['stock'].tolist()st_data = pd.DataFrame(get_extras('is_st',stock_list ,start_date='2017-10-25', end_date='2017-11-01', df=True).iloc[-1,:])st_data.columns =['st_signal']st_list = st_data[st_data['st_signal']==True]total_filter = total_filter[total_filter['stock'].isin(st_list)==False]
/opt/conda/lib/python3.4/site-packages/sklearn/preprocessing/data.py:160: UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
  warnings.warn("Numerical issues were encountered "
/opt/conda/lib/python3.4/site-packages/sklearn/utils/validation.py:526: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using r*el().
  y = column_or_1d(y, warn=True)

预测前50只¶

total_filter.sort('score',inplace = True,ascending = True)total_filter.reset_index(drop=True,inplace = True)total_filter[:50]

stockpredict_prob_scale_xpredict_prob_scale_yscore_logitscore_SVMscore
0002627.XSHE0.9853440.587367313.016.0
1002872.XSHE0.9314940.598285169.025.0
2300184.XSHE0.9169630.5906182112.033.0
3002682.XSHE0.9183000.5731752014.034.0
4603385.XSHG0.9098260.5572512618.044.0
5002772.XSHE0.8512360.624213585.063.0
6603878.XSHG0.8881960.5330113827.065.0
7603599.XSHG0.8662260.5421585024.074.0
8300537.XSHE0.8461310.5725386115.076.0
9002164.XSHE0.8424080.5628506716.083.0
10300510.XSHE0.8661810.5226395141.092.0
11002564.XSHE0.8472250.5239736037.097.0
12603839.XSHG0.8590980.5161155444.098.0
13603928.XSHG0.8295490.5549548019.099.0
14603926.XSHG0.8351500.5345657525.0100.0
15603116.XSHG0.8816290.4827894162.0103.0
16603167.XSHG0.8449650.5129056445.0109.0
17603086.XSHG0.8979190.4586433182.0113.0
18601900.XSHG0.8021730.6189321107.0117.0
19603768.XSHG0.8218760.5265808635.0121.0
20603035.XSHG0.8175210.5290338933.0122.0
21300349.XSHE0.9123510.4400522499.0123.0
22300569.XSHE0.8304280.5122347946.0125.0
23603980.XSHG0.7990440.55923111217.0129.0
24002788.XSHE0.8431380.4786296666.0132.0
25603586.XSHG0.8238480.5100378448.0132.0
26603979.XSHG0.7905120.59104512211.0133.0
27603033.XSHG0.8089660.52925510232.0134.0
28300485.XSHE0.8293670.4921288158.0139.0
29603556.XSHG0.8347410.4813107663.0139.0
30603558.XSHG0.8152260.5117059247.0139.0
31002541.XSHE0.9584160.40528211134.0145.0
32300391.XSHE0.8564820.4506425690.0146.0
33300407.XSHE0.8763550.43817545102.0147.0
34603535.XSHG0.7864740.54788012822.0150.0
35603367.XSHG0.7846530.54848713021.0151.0
36300338.XSHE0.8333770.4608647778.0155.0
37002377.XSHE0.9803160.3901505151.0156.0
38002574.XSHE0.7841130.53108113129.0160.0
39300178.XSHE0.7978670.50640011351.0164.0
40002791.XSHE0.8391940.4422527195.0166.0
41002746.XSHE0.8645860.42109352115.0167.0
42603226.XSHG0.7783500.52644613536.0171.0
43002740.XSHE0.8106780.47323810171.0172.0
44603117.XSHG0.7924070.50000012055.5175.5
45603630.XSHG0.7952530.48490411761.0178.0
46603017.XSHG0.8068540.46047610480.0184.0
47300587.XSHE0.7781950.50868913649.0185.0
48002734.XSHE0.8703100.39831047141.0188.0
49603225.XSHG0.8578360.40298555136.0191.0

预测前100只¶

total_filter[:100]

stockpredict_prob_scale_xpredict_prob_scale_yscore_logitscore_SVMscore
0002627.XSHE0.9853440.58736731316
1002872.XSHE0.9314940.59828516925
2300184.XSHE0.9169630.590618211233
3002682.XSHE0.9183000.573175201434
4603385.XSHG0.9098260.557251261844
5002772.XSHE0.8512360.62421358563
6603878.XSHG0.8881960.533011382765
7603599.XSHG0.8662260.542158502474
8300537.XSHE0.8461310.572538611576
9002164.XSHE0.8424080.562850671683
10300510.XSHE0.8661810.522639514192
11002564.XSHE0.8472250.523973603797
12603839.XSHG0.8590980.516115544498
13603928.XSHG0.8295490.554954801999
14603926.XSHG0.8351500.5345657525100
15603116.XSHG0.8816290.4827894162103
16603167.XSHG0.8449650.5129056445109
17603086.XSHG0.8979190.4586433182113
18601900.XSHG0.8021730.6189321107117
19603768.XSHG0.8218760.5265808635121
20603035.XSHG0.8175210.5290338933122
21300349.XSHE0.9123510.4400522499123
22300569.XSHE0.8304280.5122347946125
23603980.XSHG0.7990440.55923111217129
24002788.XSHE0.8431380.4786296666132
25603586.XSHG0.8238480.5100378448132
26603979.XSHG0.7905120.59104512211133
27603033.XSHG0.8089660.52925510232134
28300485.XSHE0.8293670.4921288158139
29603556.XSHG0.8347410.4813107663139
.....................
70300442.XSHE0.8119550.40547799132231
71300432.XSHE0.7787610.44025013498232
72002513.XSHE0.8441200.37188565168233
73603003.XSHG0.8843850.35459839196235
74002775.XSHE0.7701740.44311814494238
75603615.XSHG0.7486190.48070017565240
76002521.XSHE0.7946550.412059118127245
77002743.XSHE0.7593370.45727216283245
78300421.XSHE0.8262440.37545483163246
79300200.XSHE0.8000440.403777111135246
80603808.XSHG0.7711020.436278143103246
81603139.XSHG0.7595550.45182916186247
82300157.XSHE0.8107450.391008100149249
83603038.XSHG0.8452500.35580163192255
84603689.XSHG0.7369770.48128919764261
85601233.XSHG0.8778840.34182744218262
86603801.XSHG0.7191120.53021623131262
87603018.XSHG0.8078580.376245103162265
88300233.XSHE0.8890600.33103637230267
89002623.XSHE0.9143650.32112022245267
90002611.XSHE0.7617950.429999158110268
91601155.XSHG0.7034730.55011825020270
92601717.XSHG0.6960490.6085782628270
93000581.XSHE0.8385790.34775072202274
94603889.XSHG0.7625610.416682156121277
95300004.XSHE0.7755220.398924138140278
96603758.XSHG0.7204980.50833422850278
97603887.XSHG0.7364000.45991019881279
98603181.XSHG0.7451250.44073218797284
99603906.XSHG0.7313660.45565920784291

100 rows × 6 columns

全部回复

0/140

量化课程

    移动端课程