My database patent finally has been granted after a long and expensive ordeal. While this is better than not having it granted after a long and expensive ordeal, it still was a truly pathetic reflection on the state of the American patent system. My perception (and, from what I can gather, that of most sensible individuals) is that the American intellectual property system as a whole is broken beyond repair and is one of the primary impediments to real innovation in this country. The system only serves large corporations with deep pockets and large teams of lawyers — and they mostly use it to troll smaller companies or build defensive portfolios to deter competitor lawsuits.
But enough about the sewage dump known as the US Patent system; my thoughts on it are unambiguously expressed here. Instead, this is a happy post about happy things. And what makes people happier than a stream database?
My patent is for a type of stream database that can be used to efficiently manage and scan messages, business transactions, stock data, news, or myriad other types of events. It is the database I wish I had when I was doing high-frequency statistical arbitrage on Wall Street, and which I subsequently developed and refined.
I’ll post a more detailed discussion of it shortly, but here is the basic gist. The idea is based on a sort of indexed multiply-linked-list structure. Ok, maybe that’s too basic a gist, so I’ll elaborate a little.
To use a common example from stock trading, we may wish to query something like the last quote before a trade in a stock. As an individual query, this is easy enough to accomplish in any type of database. However, doing it efficiently and in high volume becomes more challenging. Standard relational and object databases quickly prove unsuitable. Even stream databases prove inadequate. They either require scanning lots of irrelevant events to arrive at the desired one or waste lots of space through sparse storage and/or are constrained to data at fixed intervals. But real data doesn’t work that way. Some stocks have lots of trades and few quotes, others have few quotes and lots of trades. Events happen sporadically and often in clusters.
My approach is to employ a type of multiply-linked list. Each entry has a time stamp, a set of linkages, and a payload. In the stock example, an event would link to the previous and next events overall, the previous and next events in the same stock, and the previous and next events of the same type and stock (ex. quote in IBM or trade in Microsoft). To speed the initial query, an index points to events at periodic intervals in each stock.
For example, to find the last quote before the first trade in IBM after 3:15:08 PM on a given day, we would use the index hash to locate (in logarithmic time) the latest trade prior to 3:15PM in IBM. Then we would scan trades-in-IBM forward (linkage 3) until 3:15:08 to pick the latest. Then we would scan IBM backward (linkage 2) from that latest trade until we encounter a quote.
We also could simulate a trading strategy by playing back historical data in IBM (linkage 2) or all stock data (linkage 1) over some period. This could be done across stocks, for individual stocks, or based on other specific criteria. If there are types of events (example limitbooks) which we do not need for a specific application, they cost us no time since we simply scan the types we care about instead.
This description is vastly oversimplified, and there are other components to the patent as well (such as a flow manager). But more on these another time.
If you’re curious, the patent is US 11593357 B2, titled “Databases And Methods Of Storing, Retrieving, And Processing Data” (originally submitted in 2014). Since the new US Patent Search site is completely unuseable (and they don’t provide permalinks), I’ve attached the relevant document here.