TwoSigma: Quant features testing

沥川 bio photo By 沥川

背景

触发 Any ideas for reaching the 0.7 tier?

其中, @VadimNareyko (4th in this Competition) 特别提到

...
4. Use sample data to test the code before real training.
5. Train/Validation sets are really important - try to think how you validate your model.
6. Try more features and their combinations.
...

由于需要在测试集做 history_df, 目前线上测试一版需要 5h+, 由于本次测试一版只需要 10m

故需要尽可能在本地多做测试, 调好成绩后再 ‘submission’

计划

测试尽可能多的 quant features, 包含他们的组合..

方案

  • Moving average
  • Exponential Moving Average
  • MACD
  • Bollinger Band
  • RSI
  • Volume moving avreage

参考

Moving average

An example of two moving average curves In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It is also called a moving mean (MM)[1] or rolling mean and is a type of finite impulse response filter.

参考. Moving_average Wiki Pedia

  • 使用 3, 7, 14 lag + 4 features
    • valid loss1: 0.624249 / valid loss2: 0.641404
    • valid score: 1.50522 / test score: 0.70758
  • 使用 3, 7 lag + 4 features
    • valid loss1: 0.664212 / valid loss2: 0.658955
    • valid score: 1.10837 / test score: ?
  • 使用 3, 7, 14, 20 lag + 4 features ==> +++
    • valid loss1: 0.607309 / valid loss2: 0.615001
    • valid score: 1.71867 / test score: 0.71918
  • 使用 3, 7, 14, 20 lg + 3 features - open
    • valid loss1: 0.61068 / valid loss2: 0.619538
    • valid score: 1.65060 / test score: ?
  • close_and_volume
    • valid loss1: 0.60487 / valid loss2: 0.617808
    • valid score: 1.70805 / test score: ?
  • 修正 -1 后, 使用 3, 7, 14, 20 lag + 4 features
    • valid loss1: 0.60594 / valid loss2: 0.618462
    • valid score: 1.69276 / test score:
  • add std
    • valid loss1: 0.61125 / valid loss2: 0.622347
    • valid score: 1.65302 / test score: ?
  • 3,5,7,14,20 lag
    • valid loss1: 0.60631 / valid loss2: 0.619793
    • valid score: 1.66843 / test score: ?
  • 3,5,7,14,20 lag - round 400 450
    • valid loss1: 0.586921 / valid loss2: 0.614442
    • valid score: 1.77644 / test score: 0.71397
  • 使用 3, 7, 14, 20 lag - round 500 500
    • valid loss1: 0.586921 / valid loss2: 0.614442
    • valid score: 1.77644 / test score: ?

EMA (Exponential Moving Average)

参考 Exponential_moving_average

  • ewma span = 4
    • valid loss1: 0.599245 / valid loss2: 0.611142
    • valid score: 1.72453 / test score: ?
  • ewma span = 7
    • valid loss1: 0.624249 / valid loss2: 0.641404
    • valid score: 1.50522 / test score: 0.70758
  • ewma span = 14
    • valid loss1: 0.624249 / valid loss2: 0.641404
    • valid score: 1.50522 / test score: 0.70758
  • ewma span = 20
    • valid loss1: 0.624249 / valid loss2: 0.641404
    • valid score: 1.50522 / test score: 0.70758

MACD

MACD: (12-day EMA - 26-day EMA)

Moving average convergence divergence (MACD) is a trend-following momentum indicator that shows the relationship between two moving averages of prices. The MACD is calculated by subtracting the 26-day exponential moving average (EMA) from the 12-day EMA

参考 Moving Average Convergence Divergence - MACD

  • MACD
    • valid loss1: 0.589887 / valid loss2: 0.600827
    • valid score: 1.73427 / test score: 0.66297

Bollinger Band

Bollinger Bands are a type of statistical chart characterizing the prices and volatility over time of a financial instrument or commodity, using a formulaic method propounded by John Bollinger in the 1980s. Financial traders employ these charts as a methodical tool to inform trading decisions, control automated trading systems, or as a component of technical analysis. Bollinger Bands display a graphical band (the envelope maximum and minimum of moving averages, similar to Keltner or Donchian channels) and volatility (expressed by the width of the envelope) in one two-dimensional chart.

参考: Bollinger_Bands

  • rolling = 7
    • valid loss1: 0.588953 / valid loss2: 0.601884
    • valid score: 1.82173 / test score: 0.67890 (而且有很多 0)
  • rolling = 14
    • valid loss1: 0.589887 / valid loss2: 0.57906
    • valid score: 1.875077 / test score: ?
  • rolling = 20
    • valid loss1: 0.573969 / valid loss2: 0.588355
    • valid score: 1.870755 / test score: ?

RSI

The Relative Strength Index (RSI), developed by J. Welles Wilder, is a momentum oscillator that measures the speed and change of price movements. The RSI oscillates between zero and 100. Traditionally the RSI is considered overbought when above 70 and oversold when below 30. Signals can be generated by looking for divergences and failure swings. RSI can also be used to identify the general trend.

参考 RSI Wiki Pedia

  • rsi = 6
    • valid loss1: 0.600206 / valid loss2: 0.610734
    • valid score: 1.75754 / test score: 0.68932
  • rsi = 6,14,20
    • valid loss1: 0.593609 / valid loss2: 0.602936
    • valid score: 1.77826 / test score: 0.64867

Volume moving avreage

A Volume Moving Average is the simplest volume-based technical indicator. Similar to a price moving average, a VMA is an average volume of a security (stock), commodity, index or exchange over a selected period of time. Volume Moving Averages are used in charts and in technical analysis to smooth and describe a volume trend by filtering short term spikes and gaps.

参考 volume_ma

  • rolling = 7
    • valid loss1: 0.607408 / valid loss2: 0.618092
    • valid score: 1.7038715 / test score: ?
  • rolling = 7,14
    • valid loss1: 0.606649 / valid loss2: 0.617411
    • valid score: 1.71001 / test score: ?
  • rolling = 7,14,20
    • valid loss1: 0.610523 / valid loss2: 0.616152
    • valid score: 1.66330 / test score: ?

Changelog

  • 2019-01-05 lichuan init.