Python for Algorithmic Trading¶

Using Machine & Deep Learning for Algorithmic FX Trading

Dr Yves J Hilpisch | The AI Machine

http://aimachine.io | http://twitter.com/dyjh

Imports¶

import math
import tpqoa
import cufflinks
import numpy as np
import pandas as pd
from pylab import plt
plt.style.use('seaborn')
%matplotlib inline
cufflinks.set_config_file(offline=True)

import warnings
warnings.simplefilter('ignore')

Oanda for FX Trading¶

Why Oanda and why FX?

technology and APIs first in algorithmic trading
proper APIs and good Python wrapper packages
low transaction costs and simple cost model
fully symmetric markets (long/short)
high liquidity and long trading hours
high leverage possible but not required
all typical order types available (trailing stop, etc.)
basically all single instrument strategies straighforward to trade
pair and basket strategies also possible
free data — both historical and streaming
full data history for all instruments
good trading apps (phone, pad, mac, win, browser)
...

The Data¶

api = tpqoa.tpqoa('dyjh.cfg')

api.get_instruments()[:4]

[('CAD/JPY', 'CAD_JPY'),
 ('Platinum', 'XPT_USD'),
 ('SGD/CHF', 'SGD_CHF'),
 ('CAD/CHF', 'CAD_CHF')]

sym = 'EUR_USD'

raw_a = api.get_history(sym, '2019-02-04', '2019-02-06', 'M1', 'A')

raw_b = api.get_history(sym, '2019-02-04', '2019-02-06', 'M1', 'B')

raw_a.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2804 entries, 2019-02-04 00:00:00+00:00 to 2019-02-05 23:59:00+00:00
Data columns (total 6 columns):
c           2804 non-null float64
complete    2804 non-null bool
h           2804 non-null float64
l           2804 non-null float64
o           2804 non-null float64
volume      2804 non-null int64
dtypes: bool(1), float64(4), int64(1)
memory usage: 134.2 KB

sel = list('c')

spread = (raw_a['c'] - raw_b['c']).mean()
spread  # average spread

0.00013838088445078468

data = ((raw_a[sel] + raw_b[sel]) / 2)

ptc = spread / data['c'].mean()
ptc  # mean spread relative to mean mid price

0.00012104865307308787

data.head()

data['c'].plot();

Efficient Markets¶

lags = 7

cols = []
for lag in range(1, lags + 1):
    col = 'lag_{}'.format(lag)
    data[col] = data['c'].shift(lag)  # lagged prices
    cols.append(col)

data.dropna(inplace=True)

reg = np.linalg.lstsq(data[cols], data['c'], rcond=-1)[0]

np.set_printoptions(precision=4)

reg

array([ 9.8156e-01, -6.1465e-04, -1.4863e-02,  5.5695e-02, -1.4918e-02,
       -2.9290e-02,  2.2428e-02])

pd.DataFrame(reg, index=cols).plot(kind='bar');

Patterns Defined¶

Investopedia writes:

Chart patterns look at the big picture and help to identify trading signals — or signs of future price movements.

The theory behind chart patterns is based on this assumption — that certain patterns consistently reappear and tend to produce the same outcomes.

The process of identifying chart patterns based on these criteria can be subjective in nature, which is why charting is often seen as more of an art than a science.

data['r'] = np.log(data['c'] / data['c'].shift(1))

cols = []
for lag in range(1, lags + 1):
    col = 'lag_{}'.format(lag)
    data[col] = data['r'].shift(lag)  # lagged returns
    cols.append(col)

data.dropna(inplace=True)

data[cols] = np.where(data[cols] > 0, 1, -1)
data[cols] = data[cols].astype(int)

data.head(5)

Frequency Approach¶

Simple¶

data['d'] = np.sign(data['r']).astype(int)

data.groupby(cols[:2])['d'].count()

lag_1  lag_2
-1     -1       779
        1       709
 1     -1       709
        1       592
Name: d, dtype: int64

data.groupby(cols[:2] + ['d'])['r'].count()

lag_1  lag_2  d 
-1     -1     -1    365
               0     39
               1    375
        1     -1    339
               0     36
               1    334
 1     -1     -1    341
               0     41
               1    327
        1     -1    293
               0     35
               1    264
Name: r, dtype: int64

(data.groupby(cols[:2] + ['d'])['r'].count() / len(data) * 100).round(2)

lag_1  lag_2  d 
-1     -1     -1    13.09
               0     1.40
               1    13.45
        1     -1    12.15
               0     1.29
               1    11.98
 1     -1     -1    12.23
               0     1.47
               1    11.72
        1     -1    10.51
               0     1.25
               1     9.47
Name: r, dtype: float64

Advanced¶

cols[:3] + ['d']

['lag_1', 'lag_2', 'lag_3', 'd']

grouped = data[cols[:3] + ['d']].groupby(cols[:3] + ['d'])

res = grouped['d'].size().unstack()

res

res['prob_up'] = (res[1] / (res[1] + res[-1])).round(3)
res['prob_down'] = 1 - res['prob_up']

res

Classification¶

from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import GaussianNB
from sklearn.neural_network import MLPClassifier
from sklearn.linear_model import LogisticRegression

Logistic Regression¶

lr = LogisticRegression(solver='lbfgs', multi_class='auto')

lr.fit(data[cols], data['d'])

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='auto',
          n_jobs=None, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=0, warm_start=False)

y_lr = lr.predict(data[cols])

accuracy_score(y_lr, data['d'])

0.4836859089279312

Gaussian Naive Bayes¶

nb = GaussianNB()

nb.fit(data[cols], data['d'])

GaussianNB(priors=None, var_smoothing=1e-09)

y_nb = nb.predict(data[cols])

accuracy_score(y_nb, data['d'])

0.4808174973108641

Support Vector Machine¶

kernels = ['linear', 'rbf', 'poly']

models = {}
for kernel in kernels:
    svm = SVC(C=5, kernel=kernel, gamma='auto')
    svm.fit(data[cols], data['d'])
    y_svm = svm.predict(data[cols])
    acc = accuracy_score(y_svm, data['d'])
    print('kernel: {:8s} | accuracy: {:6.3f}'.format(kernel, acc))
    models[kernel] = svm

kernel: linear   | accuracy:  0.485
kernel: rbf      | accuracy:  0.551
kernel: poly     | accuracy:  0.513

models

{'linear': SVC(C=5, cache_size=200, class_weight=None, coef0=0.0,
   decision_function_shape='ovr', degree=3, gamma='auto', kernel='linear',
   max_iter=-1, probability=False, random_state=None, shrinking=True,
   tol=0.001, verbose=False),
 'rbf': SVC(C=5, cache_size=200, class_weight=None, coef0=0.0,
   decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
   max_iter=-1, probability=False, random_state=None, shrinking=True,
   tol=0.001, verbose=False),
 'poly': SVC(C=5, cache_size=200, class_weight=None, coef0=0.0,
   decision_function_shape='ovr', degree=3, gamma='auto', kernel='poly',
   max_iter=-1, probability=False, random_state=None, shrinking=True,
   tol=0.001, verbose=False)}

Deep Neural Network¶

dnn = MLPClassifier(hidden_layer_sizes=3 * [96], activation='relu',
                    max_iter=2500, verbose=False)

%time dnn.fit(data[cols], data['d'])

CPU times: user 9.95 s, sys: 38.9 ms, total: 9.99 s
Wall time: 1.68 s

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=[96, 96, 96], learning_rate='constant',
       learning_rate_init=0.001, max_iter=2500, momentum=0.9,
       n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
       random_state=None, shuffle=True, solver='adam', tol=0.0001,
       validation_fraction=0.1, verbose=False, warm_start=False)

y_dnn = dnn.predict(data[cols])

accuracy_score(y_dnn, data['d'])

0.5532448906418072

Vectorized Backtesting¶

NO TRANSACTION COSTS | ONLY IN-SAMPLE PERFORMANCE</style>

data['p'] = models['rbf'].predict(data[cols])
data['s_svm'] = data['r'] * data['p']

(data['p'].diff() != 0).sum()

1268

data['p'] = dnn.predict(data[cols])
data['s_dnn'] = data['r'] * data['p']

(data['p'].diff() != 0).sum()

1444

data[['s_svm', 's_dnn', 'r']].cumsum().apply(np.exp).plot();

	c
time
2019-02-04 00:00:00+00:00	1.145710
2019-02-04 00:01:00+00:00	1.145770
2019-02-04 00:02:00+00:00	1.145650
2019-02-04 00:03:00+00:00	1.145555
2019-02-04 00:04:00+00:00	1.145360

	c	lag_1	lag_2	lag_3	lag_4	lag_5	lag_6	lag_7	r
time
2019-02-04 00:15:00+00:00	1.145790	1	1	1	-1	-1	1	1	1.309149e-05
2019-02-04 00:17:00+00:00	1.145790	1	1	1	1	-1	-1	1	2.220446e-16
2019-02-04 00:18:00+00:00	1.145870	1	1	1	1	1	-1	-1	6.981838e-05
2019-02-04 00:19:00+00:00	1.145810	1	1	1	1	1	1	-1	-5.236333e-05
2019-02-04 00:20:00+00:00	1.145655	-1	1	1	1	1	1	1	-1.352846e-04

		d	-1	0	1	prob_up	prob_down
lag_1	lag_2	lag_3
-1	-1	-1	184	25	195	0.515	0.485
	-1	1	181	14	180	0.499	0.501
	1	-1	186	21	175	0.485	0.515
	1	1	153	15	159	0.510	0.490
1	-1	-1	176	21	178	0.503	0.497
	-1	1	165	20	149	0.475	0.525
	1	-1	148	18	161	0.521	0.479
	1	1	145	17	103	0.415	0.585