AUTOMATED TRADING SYSTEM AND METHODOLOGY FOR REALTIME IDENTIFICATION OF STATISTICAL ARBITRAGE MARKET OPPORTUNITIES
A program for identifying and automatically acting on statistical arbitrage opportunities between related equities and contracts. The present invention describes an improved technique to perform statistical-pairs arbitraging in a dynamic marketplace with less risk than prior art approaches. The present invention employs an array of recent data and performance ratios involving bid and ask prices for correlated items, such as stocks.
The present invention is a continuation of and claims priority from pending U.S. patent application Ser. No. 14/020,986, filed Sep. 9, 2013, now U.S. Pat. No. 8,719,150, entitled “System and Method for an Automated Sales System With Remote Negotiation and Post-Sale Verification,” which is a divisional of U.S. patent application Ser. No. 12/266,295, filed Nov. 6, 2008, now U.S. Pat. No. 8,533,098, also entitled “System and Method for an Automated Sales System With Remote Negotiation and Post-Sale Verification,” which relies upon U.S. Patent Provisional Application Ser. No. 60/985,690, filed Nov. 6, 2007, also entitled “ System and Method for an Automated Sales System With Remote Negotiation and Post-Sale Verification,” the subject matters of which are incorporated by reference herein.
FIELD OF THE INVENTIONThe present invention relates to a statistical-pairs arbitrage technique, more specifically a data-driven system and algorithm that performs the task automatically.
BACKGROUNDIn recent years, quantitative and statistical analysis of past market behavior has been used to predict future gains. This has been the goal of many chartists and analysts for years, but until recently, data of the quality, reliability, and time/volume detail has simply not been recorded due to lack of the great speed and size of storage systems and resources required to keep it up to date. This lack of technology, as applied to market recording, has recently been compensated for by second-party data vendors that make the detailed Best Bid-Offer (BBO) and Depth of Market (DOM) data available to programmers willing to pay a price.
The present invention takes advantage of the new technological advances over the more limited prior art techniques to solve increasingly difficult problems posed by hedge fund and other company and individual requests. The ready availability of the wealth and depth of market data, and the speed of access makes possible transactions in realtime that are more risk averse or even risk free, permitting arbitrage and other activities at volumes, scales and approaches radically difficult from those employed in the prior art.
SUMMARY OF THE INVENTIONThe present invention describes an improved technique to perform statistical-pairs arbitraging in a dynamic marketplace with less risk than prior art approaches. The present invention employs an array of recent data and performance ratios involving bid and ask prices for correlated items, such as stocks, futures, equities, commodities and other instruments.
While the specification concludes with claims particularly pointing out and distinctly claiming the subject matter that is regarded as forming the present invention, it is believed that the invention will be better understood from the following description taken in conjunction with the accompanying Drawings, in which:
The following detailed description is presented to enable any person skilled in the art to make and use the invention. For purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required to practice the invention. Descriptions of specific applications are provided only as representative examples. Various modifications to the preferred embodiments will be readily apparent to one skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. The present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest possible scope consistent with the principles and features disclosed herein.
The prior art is replete with techniques that fall short of the advantages shown in the present invention. For example, attached is an exemplary arbitrage article entitled “Risk Arbitrage Opportunities in Petroleum Futures Spreads”, incorporated herein by reference, which although describing a potential platform for using the principles of the present invention, is like all of the other prior art Applicant is aware, i.e., deficient. Nonetheless, this arbitrage article is exemplary of the prior art and will be referenced herein for background.
Proposed herein is a trading system and methodology that reacts to real-time market data and automatically executes trades starting from the basic correlated pairs-trading model described above. Where the model differs from the standard is in the exact method of determining the buy and sell signals. The signals for acquiring a position are determined by a unique method of measuring statistical deviation from a historical relationship during a traveling time span of BBO data to the current bid/offer price. It instantly determines if the spread of the pair is relatively over/undervalued in the most recent market conditions, taking the bid/ask transaction costs into account. A corresponding series of trade orders is then issued in a sequence which increases the total positions while maintaining an efficient hedge.
The sell signals are another area which deviates from the standard model. The standard format says that the positions should be exited when the pair reaches the mean relationship; calculating the mean from the dynamic data set. The algorithm of the present invention untraditionally determines the mean relationship and overcomes many of the hurdles of dynamic markets, increasing the percentage of profitable trades. The values for the buy/sell signals and other variables were initially selected by intuitive guesses and some occasional wild ones that turned out to be generally correct. Eventually these variables were optimized after thousands of incremental combinations of the fixed signals and variables. Their combined effectiveness was rated using their performance on a mock portfolio speeding through roughly 7 months of BBO historical data for the Gasoline and Crude contracts traded on NYMEX. Additionally, some of the most important variables are dynamically based on the market conditions. Applicant believes that there is much potential for increasing the efficiency and profitability of the algorithm.
The methodology for the strategy itself is rooted in the statistical analysis of dynamic data streaming in from the two correlated markets in continuous time. By the chaotic nature of market data, these traveling data sets present interesting and occasionally non-linear results as the variables are changed. Specifically, the shortening of the length of historical data stored in the data set can yield seemingly random results in back-tests.
The initial perception of randomness is misplaced and Applicant managed to find that this problem with the small data set was due to the quality of the samples being added at each interval of measurement. Applicant first attempted to quantify the bid-offer spread by using the average between the bid and offer in each contract as the basis for the comparison of the spread between the pairs. It became immediately apparent, however, that the moment the first mock trade was entered that the data set did not adequately describe the relationship in the spread at the small scale. The Z-score of comparing the position that resulted from the transaction costs reflected dozens of standard deviations from the mean, when the Z-score should have been very close to the entrance signal.
It is not only the effect of averaging in the contract's transaction spread that cripples the integrity of the data set, but also the underlying differences in the paired markets. For the contract pairs to work in a pairs strategy, they must have a high correlation coefficient. But this does not mean that they must trade with the same market liquidity or pricing structure in order to be successful. To refer back to the Gasoline/Crude example, it clearly illustrates both aspects of varying liquidity and differing trading units between the two contracts.
To overcome both of these issues, the data must be manipulated prior to admission into the data set. The manipulation devised herein was aimed at combining the two markets in a way in which the $0.01 ticks of crude would not overshadow the $0.0001 as an average or difference method would. Applicant considered that the ideal comparison would be to take the ratio of one to the other. However, the next problem one faces is how to take the bid of one contract and the offer of the other in a ratio and not have the data set biased towards that position. Applicant discovered that the answer is simply to add both the [bid of A/offer of B] and [ask of A/offer of B] simultaneously at every interval as if they were two separate observations.
This practice of using two observations which are selected to be as far apart as possible can be described as maintaining a larger variance in a data set during every observation. When the previously failed back-test attempt was made using this manipulated data set, the immediate value of the Z-score was barely affected, as one would initially hope. Not only does the instant technique provide the only effective method for using these limited sample sets, but it enables the strategy to accurately gauge its transaction cost as a component of the statistics.
The aforementioned arbitrage article does not approach these particularly difficult dilemmas faced above when using the smaller time scales, and simply does not have to address them under the paradigm presented.
Additional differences of the present invention from prior art approaches include the length of the historical ratio, the frequency of data added to the historical rate, and the particular usage of not one but two (or more) signals for entering multiple positions. The goal of entering into positions based on multiple signals is to be able to expose a smaller portion of the account to the minor daily fluctuations of the trading, while not crippling them when larger movements take place. These larger but less frequent movements would be capitalized on by the larger signals and the remaining funds in the account.
This was an entertaining idea, until an evolutionary algorithm back-tested on the historical data showed that the optimal combination of the two signal variables was slightly less efficient than simply choosing one good signal variable in the middle. This capability is still featured in the code, however, limiting the second “wave” of purchases to 0 contracts.
With reference now to
As shown in
In a further embodiment of the present invention, additional description of a system and methodology for trading correlated pairs of investment instruments with buy/sell triggers based on indicators is provided herein. More particularly, the indicators in this embodiment are updated in 5-second increments, as illustrated in
With reference now to
The accumulated historical data 106 is backed up in a file 220 so that the program does not have to collect data for 2 days should its function be interrupted. This is automatically fed in from the last run of the program when it is started, i.e., reference identifiers 102 and 104.
In particular, the two correlated pairs of stocks/futures contracts or other investment instrument is used to collect the bid and ask data for each of them as it is generated until it accumulates in a two-day history. In the fashion described in the previous paragraphs, the bid of contract one is divided by the offer of contract two and that dividend is added to the array, plus the offer of contract one is divided by the bid of contract two is also immediately added. As discussed, data is preferably collected every five seconds (136) during the hours that NYMEX is traded on the floor from 10 am to 2:30 pm (204), until two days of data history have been collected. Although the manipulation of the data prior to its admission in the array boasts an ability to account for transaction costs and varying liquidities, the enormous spreads observed in after-hours trading make the data much less useful.
Secondly, this array comprises the two-day average and is the set from which the standard deviation is calculated (214). When more than two days of data are collected, then a replacement process begins by deleting the oldest data and replacing it with the newest data pursuant to known computer science database techniques, e.g., employing pointers into an array for overwriting the oldest content therein. The replacement preferably occurs every five seconds, seriatim, until the entire two days have been cycled through and replaced with new data, and then the process begins again. When more than two days of data has accumulated, the trading pursuant to the principles of the present invention is triggered. Each time the depth of market changes (128), which is the number of tiers between the bid and ask prices that change, on either of the two contracts, then the ratio of the two is analyzed by taking the Z-score of the ratio.
With reference now to
With reference now to
If no current positions taken 410 or the maximum positions have been reached 412, then the aforementioned Z-score positive and negative positions are taken 414. Additionally, if no current positions taken 410, then an exit opportunity should be sought 416. Once the positive and negative Z-score positions are determined 414, then if a negative Z-score 418 or a positive Z-score 420, the requisite position is then taken 422, and an order is sent to the trade server 424 pursuant to the position.
More particularly, the Z-score is essentially a quantitative rating system for how far the current ratio has deviated from the recent mean. The Z-scores of the ratios of [Offer of contract 1/bid of contract 2] and [bid of contract 1/offer of contract 2] are used to take “Negative” and “Positive” positions, respectively, as shown in
With reference now to
-
- BUY A contracts of Contract 1 @ the Ask price using a limit order; and
- SELL B contracts of Contract 2 @ the Bid price using a limit order.
With further reference to
-
- SELL A contracts of Contract 1 @ the Bid price using a limit order; and
- BUY B contracts of Contract 2 @ the Ask price using a limit order.
A signal Z-score can be set at one or negative one, and if the ask of contract one divided by the bid of contract two and the Z-score of that is less than a signal, then the program triggers the buy of one contract of contract one and a sell of a contract of contract two 506. In the alternative, if the bid of contract one divided by the ask of contract two is positive 508, then the program is triggered to sell contract one and buy contract two 510. Buy contracts are done by buying at the ask and selling at the bid. It should be understood that the trigger for beginning trading is when contract one and contract two volumes are equal to zero. It should also be understood, however, that this actual ratio varies based on the markets being traded and the appropriate hedge, which will be discussed in further detail in the following paragraphs.
The positions are built up using the present invention to purchase the smallest number of lots possible for each trade. Every time an order is placed, using the maximum of two contracts at a time, the total portfolio is increased in a uniform manner, ideally as close to the optimal hedging ratio as possible.
With reference now to
Regardless of the hedging ratio, for as long as the maximum positions have not been acquired, the program will increase its position in the market while the Z-score instructs it to do so (422). Every time an order is sent to the trade server 424 and filled 506/510, a confirmation is sent from the server back to the client running the code. The program maintains a record of the number of contracts it has and the average price paid for each contract, as described in connection with
Once the orders for contract one and contract two are filled, then the program immediately begins to look for an exit signal (416). With reference now to
It is exactly the opposite if you are holding the other position, and the entrance signal was positive 716, the aforementioned second test used 718 to determine the Z-score 720 and exit position taken 714.
This feature of the second condition for exiting (710, 718) in an important feature for the success of the program. To again revert to the dilemmas faced when using short spans of historical data as the sample set, the statistical calculations are a direct result of the data set. However, when the data set is only 2 days long, for example, a position held for half of a day can see ¼ of the information which recommended the position be taken disappear and be replaced. In some volatile cases, Applicant has seen a position held for an entire day within a fraction of the sell signal, but the dynamic nature of the data set forced the program to hold the position to take the burden of the loss for the entire day.
This second condition ensures that these positions will not be exited as a large loss simply because of the affect of the 2-day data set.
As an example of how the present invention is used in a dynamic market, Applicant applied the principles of the current invention to the volatile crude market during the summer months of 2007. During this time, Applicant made various observations.
First, NYMEX requires a certain cash margin up front for every futures contract purchased or sold. This margin is calculated by the exchange to make sure that someone has enough money to cover a “one day maximum expected loss” for the contract. This limits leverage and makes sure that the contracts traded by their exchange can be completed. Because crude is strongly correlated with gasoline, they give a “margin credit” for going long on one and shorting the other (i.e., crude margin: $6,412 and gasoline margin: $6,075 but if you buy one and short the other, the total margin is something like $3,800).
They have several different ratios of assorted contracts that they offer these margin “credits” for. Such as 3:2:1 (crude:gasoline:heating oil) or 2:2:5 (gasoline:heating oil:crude). In designing the program, applicant operated under the assumption that the exchange formulated these credits based on how correlated the ratios are and how well one was hedged against the other. Applicant found that none of these credit ratios actually hedged accurately. In fact, they were so inaccurately biased towards the fluctuation of gasoline, that over a long-term back test the risk was very high and the profit was only when gasoline behaved as predicted.
Applicant therefore looked at the physical crude to gasoline yield ratio (as published by the Department of Energy) and ran a back test with that and determined that the maximum loss that was ever experienced in the past 10 years was decreased by 8-9 times and that the average volatility of the hedge was decreased by approximately 47% per day. This is a substantial amount.
Most people on the trading floor and traders in general have classically used the 3:2:1, 2:2:5 and 1:1 spread because they can calculate it in their head quickly. And traders tend to play around with the ratio based on how it is moving throughout the day. Applicant has also learned that some traders (specifically traders hedging refineries' stocks and future production) have their own methods of determining hedging ratios. But as far as speculation traders, Applicant has not heard about using the Department of Energy numbers used herein which is important for decreased risk and overall profit and success of the program, probably because it is more difficult to calculate a changing ratio as fast as the market moves, by hand that is.
Applicant notes that improvements in risk assessment can help determine exactly how much to leverage based on the size of the account and the current/historical market activity.
The principles of the present invention are also applicable in various other trading contexts, as is well understood to those skilled in the art, e.g., options trading.
Claims
1. A trading system comprising:
- a memory containing data on a plurality of investment items,
- wherein each investment item has associated therewith bid data and ask data over a given period, and a number of ratio indicators on respective bid/ask spreads between respective pairs of said investment items; and
- a processor, connected to said memory, calculating from first and second ratio indicators a valuation of a given pair of investment items in relation to each other,
- wherein said first ratio indicator comprises the division of a bid of a first investment item by an ask of a second investment item, and
- wherein said second ratio indicator comprises the division of an ask of said first investment item by a bid of said second investment item,
- whereby said first and second ratio indicators provide a valuation between at least said first and second investment items, and improved arbitrage and other investment opportunities.
2. The trading system according to claim 1, wherein at least one of said first and second ratio indicators are precomputed and retrieved from said memory.
3. The trading system according to claim 1, wherein at least one of said first and second ratio indicators are not precomputed and is calculated from said bid and ask data in said memory.
4. The trading system according to claim 1, wherein the given pair of investment items are selected from the group consisting of commodities, contracts, futures, equities, options and combinations thereof.
5. The trading system according to claim 1, wherein said valuation is performed upon occurrence of a market depth change.
6. The trading system according to claim 1, wherein, upon occurrence of a valuation disparity from a normal position, said processor initiates a buy or sell order on at least one of said first and second investment items.
7. The trading system according to claim 6, wherein said processor monitors said buy or sell orders.
8. The trading system according to claim 1, wherein said processor automatically initiates an order upon detection of a trigger condition.
9. A method for trading investment items comprising:
- retrieving data, from a memory, on a plurality of investment items stored therein,
- wherein each investment item has associated therewith bid data and ask data over a given period, and a number of ratio indicators on respective bid/ask spreads between respective pairs of said investment items;
- calculating, by a processor, a first ratio indicator comprising the division of a bid of a first investment item by an ask of a second investment item;
- calculating, by a processor, a second ratio indicator comprising the division of an ask of said first investment item and a bid of said second investment item; and
- calculating, by a processor, a valuation based on said first and second ratio indicators.
10. The method according to claim 9, wherein at least one of said first and second ratio indicators are precomputed and retrieved from said memory.
11. The method according to claim 9, wherein at least one of said first and second ratio indicators are not precomputed and is calculated from said bid and ask data in said memory.
12. The method according to claim 9, wherein the investment items are selected from the group consisting of commodities, contracts, futures, equities, options and combinations thereof.
13. The method according to claim 9, wherein said step of calculating is performed upon occurrence of a market depth change.
14. The method according to claim 9, wherein, upon occurrence of a valuation disparity from a normal position, said processor initiates a buy or sell order on at least one of said first and second investment items.
15. The method according to claim 14, wherein said processor monitors said buy or sell orders.
16. The method according to claim 15, wherein processor automatically initiates an order upon detection of a trigger condition.
17. A trading system comprising:
- a database, said database containing data related to plurality of investment items,
- each investment item having associated therewith bid data and ask data over a given period, and a number of ratio indicators on respective bid/ask spreads between respective pairs of said investment items, and
- a processor, said processor, connected to said database, said processor calculating a first said ratio indicator by dividing a bid from a first investment item by an ask of a second investment item, and calculating a second ratio indicator by dividing an ask of said first investment item by a bid of said second investment item,
- whereby said ratio first and second ratio indicators for said first and second investment items provide a valuation, improved arbitrage and other investment opportunities.
18. The trading system according to claim 17, wherein each of said investment items are selected from the group consisting of commodities, contracts, futures, equities, options and combinations thereof.
19. The trading system according to claim 17, wherein said valuation is performed upon occurrence of a market depth change.
20. The trading system according to claim 17, wherein said processor monitors the data and ratio indicators in said database until a normal position occurs, and
- wherein said processor automatically initiates an order upon detection of a trigger condition.
Type: Application
Filed: May 5, 2014
Publication Date: Aug 28, 2014
Inventor: Remington John Sutton (Lubbock, TX)
Application Number: 14/269,227
International Classification: G06Q 40/04 (20120101);