W

Using Genetic Algorithms To Search For Trading Signals

July 16, 2018

Motivation

When applying machine learning to financial markets most quantitative research focus on optimization algorithms for user defined features. This approach raises two issues. One is that the user needs to have deep domain knowledge in order to generate and define features. The second, related issue occurs when the act of gaining domain knowledge prejudices the researcher about what features may or may not work. In this walkthrough, we consider an alternative approach where we start with the assumption that the researcher has no knowledge of what works but in the entire universe of trading strategies there is a strategy that does work. This assumption frames the problem as a search problem as opposed to an optimization problem. Now the question for the researcher is how to approximate the entire universe of trading strategies which may be both intellectually and computationally impossible to generate. For similar problems, genetic algorithms have been used with good results. The following is a walk through of using a genetic algorithm in conjunction with grammatical evolution to generate trading strategies and find the fittest one.

Solution Walkthrough

In biology a genotype is the set of genes an organism carries around while a phenotype is its observable characteristics.

The implementation of genetic algorithms generally start with a randomly generated pool of genotypes that are scored for fitness. Once scored, you generate a new pool of the fittest genotypes from the initial pool, pick random pairs from the pool, mate the picked pairs and mutate the mated genotype. Each time we run this process we create a new generation of genotypes that are fitter than the previous generation. The final result is a pool of genotypes that can be ranked by score where the highest ranked genotype is the best solution to the problem.

In order to use genetic algorithms we need to be able to map our genotypes to trading strategies. Here we use grammatical evolution to map our genotypes or program fragments that execute the backtest of the strategy. Grammatical evolution allows us to evolve trading strategies based on our specified grammar and domain knowledge.

The following is a walkthrough of how we take an array like [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12] and translate it to a trading strategy represented as strategy-a:AAPL:12::strategy-b:GOOG:256 which can then be used to run a backtest.

Generate genotypes

We randomly generate 10,000 genotypes as an array of integers with length 12. This initial pool is used to generate the first generation of phenotypes.

Generate Strategies (Phenotypes)

For this example we make the following assumptions

target ticker =  Nasdaq 100
ticker symbols = ["AAPL", "GOOG", "AMZN", "QQQQ", "WFC"]
statements = ["::<stmnt>", "<stmnt>::<code>"]
strategy_fragments = [
  "strategy-a:<ticker>:<param>",
  "strategy-b:<ticker>:<param>",  
  "strategy-c:<ticker>:<param>",  
  "strategy-d:<ticker>:<param>"
]  
genotype = [71, 20, 84, 226, 154, 252, 99, 256, 230, 126, 229, 110]
phenotype = "<code>"

The target ticker is the security which we are trying to find winning trading strategies for.

tickers symbols represent the pool of securities that we’ll use in our trading strategies.

statements represent building blocks of primitive code that we’ll use to generate our trading strategies. strategy fragments represent our trading strategies with parameters separated by colons and bracketed by arrow braces.

The genotype represents the sample array which we use to generate our phenotype, a string of trading trading strategies. We parse the genotype to generate backtest

while <code> initializes our phenotype.

To generate a trade fragment we need four integers so our genotype of 12 integers can generate a strategy made up of 3 different trade fragments. To illustrate how we apply grammatical evolution we start with the first four integers of the assumed genotype or [71, 20, 84, 226].

Map Statement

statements = ["::<stmnt>", "<stmnt>::<code>"]

genotype = [71, 20, 84, 226]

phenotype = "<code>"
71 % 2 = 1
"<stmnt>::<code>" = statements[1]
phenotype.replace("<code>", "<stmnt>::<code>")
phenotype == "<stmnt>::<code>"

We initialize our strategy phenotype with <code>. The rule for phonotype is that <code> can be replaced with a value in statements. For our example, either ::<stmnt> or <stmnt>::<code>. We replace the phenotype based on the first integer of our genotype. In this example, we take the first integer 71 mod it by 2, the length of statements. Since 71 % 2 leaves us with 1 we replace <code> with <stmnt>::<code> or the second item in the statements array.

The string, <stmnt>, is what we use to encode the actual strategy fragment. The string <stmnt>::<code> allows for multiple strategy fragments which would make up the overall strategy.

Map Strategy

genotype = [70, 20, 84, 226]

phenotype = "<stmnt>::<code>"
20 % 4 = 0
strategy_fragments[0] = "strategy-a:<ticker>:<param>"
phenotype.replace("<stmnt>", "strategy-a:<ticker>:<param>")
phenotype = "strategy-a:<ticker>:<param>::<code>"

If <stmnt> exists, we replace it with a strategy fragment which includes the code for the trading strategy, a ticker symbol for the security that the strategy will be applied to and a numeric parameter. In picking the strategy fragment, we take the integer in the genotype and take the modulus of the length of the strategies.

Map Ticker

phenotype = "strategy-a:<ticker>:<param>::<code>"
84 % 5 = 4
tickers[4] = "WFC"
phenotype.replace("<ticker>", "WFC")
phenotype = "strategy-a:WFC:<param>::<code>"

If <ticker> exists, we replace it with a ticker symbol from the ticker symbols list.

Map Parameter

phenotype = "strategy-a:WFC:<param>::<code>"
params = 225
phenotype.replace("<param>", "225")
phenotype = "strategy-a:WFC:225::<code>"

If <param> exists, we replace it with the value of the integer in the genotype. Here we just use the value of the integer. Since we are just using the value of the integer, careful consideration should be given to the range of integers from which we randomly pick.

Run trading strategies

For each phenotype, we decode the strategy and run a backtest. We’ll use the results of the backtest as metadata to help us rank the fitness of the phenotype and the probability that it will mate for the next generation.

Rank Strategies

In determining how to rank the phenotypes our primary goal is to ensure that fitter phenotypes have a higher chance of mating. We chose to rank the phenotypes in rank order based on the percentage of winning trades. In generating the mating population we use the phenotypes rank as a proxy for fitness and then use that rank to determine each phenotypes population in the pool. By way of example, if we limit the potential phenotypes for the mating pool to the 500 fittest phenotypes, the mating pool will have 125,250 phenotypes of which 500 will be the fittest phenotype. As a result, the fittest phenotype will have a 0.04% chance of being selected when randomly selecting phenotypes for the next generation.

Pool Size
n * (n + 1) / 2
500 * (501) / 2 = 125,250

Crossover

To perform the crossover we randomly pick two genotypes from the mating pool then pick a random slice point in the genotype. In our example, since each genotype is an array of length 12 we pick a random number between 0 and 11. We slice the genotype to the left of the slice point of the first genotype that was picked and slice to the right of the slice point of the second genotype we picked. We then splice the sliced genotypes to generate the offspring.

x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
y = [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]
slice_point = 4
xy = [1, 2, 3, 4, 5, 25, 26, 27, 28, 29, 30, 31]

Mutation

In order to maintain diversity, we cycle through each offspring array and probabilistically mutate each integer. A higher probability results in a higher probability of mutating that integer. Traditionally, in order to ensure that the algorithm is working toward higher fitness the probability of mutation is usually set low; less than 5%. However, since we’re working with a probabilistic outcome as opposed to a discrete outcome we increase the probability of mutation to 20% to increase diversity. If a mutation is triggered we either increment the integer by 1 or decrement it by 1 based on a 50% probability.

Rinse / Repeat

Once we have our new pool, we generate strategies and create a new pool. In this example we run three generations.

Results

The following table summarizes the results of this walkthrough when applied to two years of hourly Bitcoin data from different exchanges.

Strategy Kelly Cum Pnl W Kelly Total Trades Generation Pct Winners
stdevd:krakenUSD:200::abovema:btcnCNY:70 26.394604 0.79026383 0.72086895 574 1 0.6620209
stdevd:krakenUSD:94::abovema:btceUSD:220 17.309988 0.5761709 0.5373664 649 1 0.6533128
abovema:btcnCNY:250::stdevd:krakenUSD:87 17.297611 0.8965364 0.5966406 721 3 0.6518724
abovema:coinbaseEUR:221::stdevd:krakenUSD:82 6.5052958 0.60363126 0.32287654 1041 3 0.6311239
stdevd:krakenUSD:200::abovema:bitstampUSD:195 9.935446 0.6481855 0.41496494 875 3 0.63085717
stdevd:krakenUSD:118::abovema:coinbaseUSD:227 6.5914254 0.54767185 0.3035028 964 2 0.6255187
stdevd:krakenUSD:182 7.4401746 1.6357833 0.62182724 1658 3 0.6206273
stdevd:krakenUSD:214 6.7178283 1.5198021 0.5525266 1629 1 0.62062615
stdevd:krakenUSD:142 7.404855 1.675344 0.629923 1691 3 0.62034297

These results are raw and haven’t been tested for randomness and does not include trading costs.

Send comments, corrections to waynechoi@gmail.com