All posts by K.M. Halpern

I Finally Got My Patent

Official Patent

My database patent finally has been granted after a long and expensive ordeal. While this is better than not having it granted after a long and expensive ordeal, it still was a truly pathetic reflection on the state of the American patent system. My perception (and, from what I can gather, that of most sensible individuals) is that the American intellectual property system as a whole is broken beyond repair and is one of the primary impediments to real innovation in this country. The system only serves large corporations with deep pockets and large teams of lawyers — and they mostly use it to troll smaller companies or build defensive portfolios to deter competitor lawsuits.

But enough about the sewage dump known as the US Patent system; my thoughts on it are unambiguously expressed here. Instead, this is a happy post about happy things. And what makes people happier than a stream database?

My patent is for a type of stream database that can be used to efficiently manage and scan messages, business transactions, stock data, news, or myriad other types of events. It is the database I wish I had when I was doing high-frequency statistical arbitrage on Wall Street, and which I subsequently developed and refined.

I’ll post a more detailed discussion of it shortly, but here is the basic gist. The idea is based on a sort of indexed multiply-linked-list structure. Ok, maybe that’s too basic a gist, so I’ll elaborate a little.

To use a common example from stock trading, we may wish to query something like the last quote before a trade in a stock. As an individual query, this is easy enough to accomplish in any type of database. However, doing it efficiently and in high volume becomes more challenging. Standard relational and object databases quickly prove unsuitable. Even stream databases prove inadequate. They either require scanning lots of irrelevant events to arrive at the desired one or waste lots of space through sparse storage and/or are constrained to data at fixed intervals. But real data doesn’t work that way. Some stocks have lots of trades and few quotes, others have few quotes and lots of trades. Events happen sporadically and often in clusters.

My approach is to employ a type of multiply-linked list. Each entry has a time stamp, a set of linkages, and a payload. In the stock example, an event would link to the previous and next events overall, the previous and next events in the same stock, and the previous and next events of the same type and stock (ex. quote in IBM or trade in Microsoft). To speed the initial query, an index points to events at periodic intervals in each stock.

For example, to find the last quote before the first trade in IBM after 3:15:08 PM on a given day, we would use the index hash to locate (in logarithmic time) the latest trade prior to 3:15PM in IBM. Then we would scan trades-in-IBM forward (linkage 3) until 3:15:08 to pick the latest. Then we would scan IBM backward (linkage 2) from that latest trade until we encounter a quote.

We also could simulate a trading strategy by playing back historical data in IBM (linkage 2) or all stock data (linkage 1) over some period. This could be done across stocks, for individual stocks, or based on other specific criteria. If there are types of events (example limitbooks) which we do not need for a specific application, they cost us no time since we simply scan the types we care about instead.

This description is vastly oversimplified, and there are other components to the patent as well (such as a flow manager). But more on these another time.

If you’re curious, the patent is US 11593357 B2, titled “Databases And Methods Of Storing, Retrieving, And Processing Data” (originally submitted in 2014). Since the new US Patent Search site is completely unuseable (and they don’t provide permalinks), I’ve attached the relevant document here.

Official Patent

Some Holiday Cheer for Nascent Writers

Readers of my blog know that I’m not given to chatty, optimistic posts. In fact, my typical post is more along the lines of “Not only will you fail at writing, but your cat will run away, your house will burn down, and you’ll spend the rest of your life tweeting from a phone you forgot to take out of airplane mode.” This post is different. While it isn’t quite optimistic, it does offer a perspective you may find uplifting, perhaps even liberating.

I’ve participated in many writing groups over the years and have managed a few as well, including one which technically qualified as Boston’s largest at the time. I have many writer acquaintances and even a few writer friends. I’m not spouting this to toot my own horn, but to lend credence to what I am about to say.

Many writers seem to have a notion of success which I deem unhealthy. I’m not saying that we should redefine "success" so that everybody is a winner or any such happy horseshit. The problem is that writers have two competing, and largely incompatible, goals. I speak here of real writers, not people who simply produce a product. The difference, to my mind, is that a writer wants to be proud of their work. My own standard is that I write what I want to read, and I think many of us implicitly or explicitly have a similar benchmark. We may try to cater to the taste of the crowd or steer toward marketability, but catering is different than pandering and steering is different than veering. At the end of the day, the stories and books which we produce must satisfy us first and foremost. This does not mean we always succeed in meeting that standard, but it is what we strive to aesthetically achieve.

However, American culture imbues us with another standard of success — one that is financial and social. Though success in this regard can be achieved via various avenues, the essential value it embodies — and that which our society most greatly respects — is the ability to sell things. In practice, this often takes the form of selling people stuff they don’t really want — but it need not. We are taught that the "net worth" of an individual is the sum of their possessions, marked to market, and distilled to a number. We are taught that we can order people by importance from lowest to greatest based on that number. We are taught that if one author sells many books and another sells few, the first is much "better" than the second. And we are taught that if a big publisher picks up a book, it is a "better" book than any which are self-published. While many of us may vociferously reject such a simplistic and materialistic outlook, we nonetheless are thralls to it. We may know that fatty foods are bad for us and that consumerism destroys the environment, yet there we all are in front of brand spanking new 100-inch televisions with bags of Fritos in our hands. Knowing and feeling are two different things, as are knowing and doing. We know we shouldn’t adopt the typical American view of success but we do anyway. Understanding and accepting that we are susceptible to such internal contradictions is crucial to avoiding the misery they otherwise can engender.

For a writer, the ability to sell our writing is essential for American-style success. These days, this entails also selling our "own story" as well. I personally find this obsession with the author rather than their work vapid at best and venal at worst, but it’s a fact of the market. The demographics of who reads and how are vastly different than a few years ago, as are the nature of publishers and what they seek. You sell yourself, then your writing. This compounds an already difficult problem for most of us. Good writers are good writers, not necessarily good salesmen. Those who spend their time selling things and have the aptitude to do so rarely also have the time or ability to write a quality book — and those who spend their time writing and have the aptitude to produce a quality book rarely have also have the time or ability to sell it.

This is a practical reality that affects almost any creative or scientific field. Those who can do can’t sell and those who can sell can’t do. But there is a deeper issue as well: a conflict of what we actually deem important with what we imagine we should deem important. What we want as authors and what we have been trained to believe we should want as Americans are largely incompatible. If we achieve only the first, we see ourselves as failures. If we achieve only the second, we see ourselves as hacks. And it is well-nigh impossible to achieve both.

It is not difficult to see why. If you’re like me and have tastes that depart even in the slightest from the mainstream, then ask yourself how many books that you really love are being published by major publishers today. Not books you’re told you’ll love, or books that you’re supposed to love, or books other people tell you they love. But books you love. For me, it’s virtually zero. The type of writing I enjoy simply isn’t published anymore. At least not by big publishers, and probably not by small presses either. It’s still being written. I’m writing it, and I’m sure plenty of other people are too. But it’s at best being self-published, and as a result is very hard to find.

The same is true of the big successes in self-publishing and is the reason neither you nor I ever will be one. The best sellers are in a small set of genres and usually involve the same perennial cast of series and authors. These authors are very good at gaming the system — i.e. at selling their books. However, they are not authors in the sense I described. They view a book as a product and nothing more. They run a business and are very good at it. For them, there is no contradiction in goals because their sole goal is financial success. There is nothing wrong with this, but it is not sufficient (or even attainable) for an author of the type I am addressing this to.

I’ll give you an example of what I mean. When I used to live in New York City, there was a famous camera store I frequented. In the same building, the next storefront was a diving shop. One day, I needed to buy some diving gear and went into that shop. I recognized some of the employees from next door, and it turned out that both stores were owned and run by the same people. The employees in either store knew everything about what they sold. If I asked an obscure question about a camera feature or model, someone knew the answer. But if I asked a subjective question, such as which camera they preferred or which BCD they found comfortable when diving, they were of little help. They could opine about which model customers preferred, and they could rave about one or the other product in a sales-pitch sort of way, but they clearly had no personal experience with the products.

I wondered at this and asked a friend who moved in similar circles about it. He explained that the product didn’t matter. It was all about understanding the market and sourcing the products at low cost. The store employees were generic highly-skilled business people. They could go into any market, learn the jargon and product specs and market layout, source the products at a good price, and then advertise and hard-sell those products very effectively. To them, it didn’t matter what they were selling. The products were widgets. The owners of those stores probably had no especial love of photography or diving but recognized those as markets they could thrive in.

Almost all self-published authors who succeed financially are of the same ilk. The books are products, and they just as happily would produce wicker baskets if that was where the money lay. Such authors have no ambition to write a high-quality literary novel. If their market research says that novels about vampire billionaires who fall for midwestern housewives are the thing, they will pump out dozens of nearly-identical ones. In this regard, such writers are a bit like the big publishers. The main differences are that (1) these self-published writers produce their own products and (2) the big publishers seem to have lost their focus these days and now employ ideological criteria rather than purely market-related ones.

The result is obvious. If I write a book of which I’m proud, it won’t get traditionally published and it won’t sell much when self-published. At a more basic level, this is a problem which affects all "producers", including artists, scientists, and musicians. To succeed in the social/financial sense, you have to spend 100% of your time relentlessly promoting yourself (and even then, the likelihood of success is small), but to produce anything of substance you have to spend all your time developing your craft and then applying it. The product-writers I described are very efficient. They are experts at what they do. After all, even in the world of marketable-schlock there is lots of competition. The winners know how to game the search engines, get in early and stay at the top, spend marketing money efficiently, and expend the minimum time necessary to produce a salable product.

The gist is that the two goals of a real writer are utterly incompatible. Writing a book we are proud of and achieving social/financial success with it are mutually exclusive for most of us. Unless you really love writing crowd-pleasing schlock or happen to be one of the handful of random literary "success" stories, it is impossible.

"But Ken," I can hear you whining, "I thought you said this would be uplifting? That my cat would still love me and my house wouldn’t burn down and I’d remember to turn off airplane mode before tweeting. How the hell is this remotely optimistic? Do you secretly run a razor-blade and cyanide business on Amazon?" Well, yes and no. Since books don’t sell, I do need some side hustles. Please visit my Amazon page for a very special offer.

Ok, fine. Here’s the inspirational bit. It isn’t that we can’t achieve both goals — it’s that we don’t have to adopt both goals. You are in control of your goals, even if your social programming reeeeally wants you to think otherwise. If your goal is to be both successful in the American sense and proud of your work, you’re going to be bitter and miserable. It’s disheartening and you’ll give up as a writer or feel resentful toward the world. But that shouldn’t surprise you. If you demand the impossible, you’ll always be disappointed. If your goal was to be a fantastic high-school teacher and also become rich from it, you’d be miserable too. It’s very hard to succeed financially in any way, let alone one which appeals to you. If I wanted to be a professional basketball player, I’d be disheartened. I’m five-foot-eight. The fault wouldn’t be with the world, it would be with me for demanding the impossible. While it’s admirable to pick difficult but attainable goals, picking wildly implausible ones is a recipe for misery. If you set out to prove the world wrong, all you’ll do is prove yourself a fool. Not because the world is right, but because there’s no point in wasting your life trying to prove anything to nine billion people who won’t notice and couldn’t care less if they did.

I’m not spouting some hippy nonsense about eschewing material possessions. You need money to survive and live comfortably. Money can buy you independence and free time. I’m not saying you don’t need money or shouldn’t pursue it. Just don’t rely on your writing for it, at least not if you want to be proud of that writing. It is perfectly fine to aim for American-style success. It’s difficult, but anything worth striving for is difficult. Nor is it unattainable, assuming your ambition isn’t too extravagant. In that’s your primary aim, do what the successful schlock-producers do and maybe you’ll succeed.

I’m also not saying you should sit in a corner munching a soggy carrot like some dejected rabbit. It is perfectly fine to write books you are proud of and hope for American-style success. I hope that my lottery ticket will win a billion dollars. There’s nothing wrong with that. It may even happen. Hope can be beneficial.

What is not fine is to expect American-style success from your writing. That is toxic. It means you’ll never reward yourself. Even if you write the greatest novel in the world, you still won’t allow yourself a sense of accomplishment. Imagine a small-town artisan who crafts beautiful furniture but demands that each piece be featured on some television show. He’ll be perpetually disappointed. No matter how great his skill and attainment, he never can give himself the slightest praise. There’s always a monkey on his shoulder telling him "So what? You’re not on television." If you’d laugh at such a person, take a good, hard look in the mirror.

To illustrate our biases, here are some scenarios. Suppose you learned that a friend …

  1. Wrote a wonderful book, got rejected by 200 agents, self-published it, and sold 3 copies.
  2. Self-published a vampire-billionaire-loves-midwestern-housewife book and sold 50,000 copies.
  3. Wrote a vapid, self-indulgent novel with elements designed to appeal to certain political sensibilities, which has been picked up by a major publisher.
  4. Self-published a book of pictures of cats with cute little taglines, which went viral and sold 100,000 copies.
  5. Wrote a passable book, though nothing worthy of note, but knew some agents and got picked up by a major publisher.

Most of us automatically would be "impressed" by (2)-(5) but view (1) as a vanity project. That is ridiculous. It’s our subconscious American training at play. Think about it. (2) may be a worthy businessman but isn’t really a writer, (4) produced a little nothing and got lucky, (5) produced tofu but knew the right people, and (3) produced what best could be termed a "vanity project" which ticked the right boxes. Of the five, only (1) produced something actually worthy of praise.

Nor are these contrived scenarios by some bitter rejectee (aka yours truly). Anyone who has contact with the publishing world knows that these are highly-realistic scenarios and that they are way more common and apropos than most of us would care to admit. So why do we view (1) this way? It’s not just our American-success programming. It’s also because of another very common scenario:

  1. Wrote complete trash, self-published it, and sold 3 copies.

(6) is what gives a bad name to self-publishing and constitutes the vast vast vast majority of self-published work. It and (2) are the reason you won’t be able to be heard above the fray or find your niche audience or sell many books.

But that isn’t as awful as it sounds. For most of history, only a privileged few even knew how to write, fewer had the means and leisure to write a book, and fewer still ever got published. Even if you wrote an incredible manuscript, without the money or connections to publish it that manuscript would end up in someone’s fireplace. So what’s different now, you may ask? Isn’t the problem the same, and only the gatekeepers and criteria have changed?

Yes and no. Yes, if you go through the gatekeepers. No, because you don’t have to. You can write a book you are proud of and self-publish it. It will be up forever as print-on-demand (and/or an ebook). You don’t have to build buzz, have a grand launch, and pray you reach critical mass before the rest of your print run gets remaindered and you end up out-of-print forever. Instead, you can put your book out there and point people to it over the years as you see fit. You can market it later when you have time or some opportunity arises. A book you are proud of will be available for anyone to purchase. Your backlist never goes away. Yes, a lot of crap gets self-published today — whereas in yesteryear only a few rich people could self-publish. But that need not bother you. Bad company does not a knave make. You’re not counting on people discovering your book by wading through all that garbage. You’re just making it available. You are the discovery mechanism. When someone asks about your book, you can point them to it. If you so choose, you can spend some money to increase the chance people will buy it. You can do this when and how you want.

And if someone at a cocktail party looks down their nose at you when you mention that you are self-published, just ask them what they’ve done lately. I wouldn’t worry too much about this happening, though. Does anyone even have cocktail parties anymore?

Incidentally, through much of the last three centuries there were no traditional publishers. Everything was self-published. But there was much less of it. Now, everyone can self-publish and everyone does. But just because a lot of other books stink doesn’t mean yours does — or that it will be viewed that way by modern, intelligent people.

In conclusion:

(1) Stop thinking of self-publication as a stigmatizing last-resort and a humiliating proof of failure. It is a tool and an opportunity. Moreover, in today’s world it is both a necessity and a reality for almost any author of substance. (2) If you write a book you are proud of, allow yourself to be proud of it. Feel successful. Decouple this sense of success from guilt or shame or anxiety about it not selling. (3) Write books you are proud of. Hope for American-style success if you wish, but do not expect it. (4) Keep writing. Write what you want to read. Be pleased that you have accomplished something.

If you complete one story you are proud of, you have accomplished more in your life than 99% of people. If you complete one book you are proud of, you have accomplished more than 99.9%. If you spend your life writing books you are proud of and allow yourself to be proud of them, you will have accomplished something almost nobody does: you will have lived a life you are proud of.

Everything everyone does is for naught, "vanity and a striving after wind," to quote Ecclesiastes. Had children? Your genes will dilute out of their progeny after a few generations. Became famous? Nobody will remember you a few years from now, and if they do it will be a mere caricature. Made a lot of money? You will is the last time you get any say how it’s spent. The best you can do is live a life you are proud of. Once you’re gone, the universe ends. It is irrelevant how many people bought your book or whether it lives on or your name is remembered.

And on that uplifting note, I once again refer you to Ken’s razor-blade and cyanide shop on Amazon. Oh, and don’t forget to leave a great review when you’re done…

Kindle Scribe — An Interesting Device Crippled by Bad Software

I’m a big fan of Kindle ereaders. However, since the long-defunct Kindle DX there hasn’t been one with sufficient screen real estate to read scientific papers and pdfs. The Kindle Scribe boasts a (slightly) bigger screen than the Kindle DX and allows writing as well. Couple that with the insanely long battery life of a Kindle ereader and it sounds like a dream machine, right? Wrong. Unfortunately, Amazon decided to follow Apple and Garmin down the path of crippling great hardware. As a result, the writing function is all but worthless to sensible users.

There were warning signs about whom the Kindle Scribe was intended for. I tried to find the screen dimensions (or resolution), and the top few pages of search results showed only the device dimensions or the diagonal (10.2 in). If you’re curious, the screen measures around 6×8 in (6.1×8.2 in to be precise, but there’s probably a tiny border) and probably is 1800×2400 pixels (given their 300 dpi spec), constituting a 1:1.3333 aspect ratio. Such information is pretty basic and useful, but even Amazon’s spec page didn’t list it. It did, however, list the pen’s dimensions, the WIFI security protocols it supports, the wattage of the power adapter, and the (sole) available color. I fail to see how any of those are more relevant than the screen dimensions.

Another warning sign is that finding any useful info about the Amazon Scribe is well-nigh impossible. Granted, it’s only been available for a few days. However, the many sites purporting to review it or to provide helpful tips contain nothing more than caffeinated regurgitation of marketing info. Even the “criticisms” read as thinly-veiled praise. You almost can hear the authors’ voices trembling, though whether from excitement or fear is unclear.

“Pro: Everything about everything. All is for the best in the best of all possible worlds, and Amazon IS that world.”

“Con: Amazon can’t get any more amazing than it currently is. But I’m sure it somehow will!!! Please don’t hurt my family …”

Frankly, I wouldn’t be surprised if they’re mostly shills. That seems to be standard practice in online marketing these days.

Marketing practices and information availability aside, the real problem is that there is no sensible way to extract notebooks from the Kindle Scribe. It is easy to write on the device, and the process is quite pleasant. It’s really a lot like a paper notebook. I’m not a user of existing writeable e-ink tablets or of Wacom devices, but I found the Kindle Scribe’s hardware perfectly suitable for my purposes. Sadly, hardware alone maketh not the product.

As far as I can tell, Kindle Scribe notebooks aren’t stored as pdfs or in any standard format. Rather, they appear to be stored as sequences of strokes in some sort of non-standard Sqlite DB. How can a Sqlite db be “non-standard”? When I try to examine the db schema in Sqlite3, I get a “malformed” error. Most likely, Amazon relies on some Sqlite3 plugin that encrypts the notebooks or otherwise obfuscates them.

None of this is worthy of criticism in itself; in fact, the ability to undo and redo my writing stroke-by-stroke is quite nifty. However, Amazon offers no sensible way to export a pdf from a notebook. The only solution is to email it to yourself as a pdf using their Whispernet.  From what I’ve read,  Whispernet pretty much acts like malware by logging all sorts of activity.  To my mind, the best policy is to keep it off all the time. Why not just fire it up briefly to export a notebook? It is quite possible that the Kindle logs activity locally and then sends the whole log to Amazon the moment Whispernet is activated. I have no evidence that this actually happens, but I wouldn’t be surprised.

However, there is a much bigger issue with Whispernet. Even if you trust Amazon’s intentions, there are a lot of other parties involved and a lot of potential data leaks. Anything you write is uploaded to Amazon’s servers, then converted to a pdf on their end, and finally emailed unencrypted to your email address. I.e., there are multiple insecure steps, not least of which is the receipt of that unencrypted email by your email service. Any security-minded individual would have an issue with this, and it precludes the use of notebooks for anything sensitive or by any professional required to adhere to security protocols.

One of the aforementioned hypercaffeinated blogs gushed over the ability of the author’s law-firm to collaboratively sign NDA’s using Kindle Scribes. My response is simple: “Are you out of your f-ing head?” I’ll mark down the name of that firm to make sure I never do business with it (though frankly, my experience is that most lawyers are terrible with technology and security — which is odd in a profession whose bread and butter is sensitive information). Why is this a bad idea? Let us count the ways. First, it would require that all participants have Amazon Scribes and Amazon accounts. I can count on one finger the number of people I know who have one — and probably ever will. Second, all the info would be passed unencrypted between the various parties and Amazon and whatever email servers are involved for each and every individual. Third, how is this innovative or necessary? Plenty of secure and simple collaborative signature mechanisms exist. Whatever you may think of Docusign, it’s a sight better than passing unencrypted PDFs around like this. I’d guess that iPads have had suitable functionality for some time (and with much better security) as well. And more than one person I know has those.

The inability to directly export PDFs may or may not be an oversight, but I suspect not. I tried a few obvious workarounds and was stymied at every turn. Sometimes it felt quite deliberate.

First, I tried extracting the relevant files. As mentioned, they appear to be in some proprietary variant of Sqlite3. There is no obvious way to extract useful info except by reverse engineering them. Supposedly, it is possible to view notebooks offline using an Amazon reader — but I have had no luck with this. Besides, Amazon doesn’t really support Linux (even though they use it on all their servers AND the Amazon Scribe) — so such functionality probably isn’t available to me without firing up a VM.

Second, I created a PDF notebook of blank pages using pdftk and a free college-ruled template I found online. My plan was to annotate it as a pdf rather than use the official notebook feature. Presumably, PDF annotations provide the same interface. I sideloaded the PDF file via USB, but annotations were unavailable. Apparently, they only are available for PDFs loaded via the “send-to-kindle” email mechanism. I can’t think of any benign reason for such a limitation.

Third, I tracked down the abstruse details of how to “send-to-kindle” and sent the PDF to my Kindle Scribe using the “convert” command in the subject line (as recommended). The Kindle Scribe wouldn’t even open it. Apparently, Amazon’s “convert” machinery is incapable of even converting a blank notebook.

Fourth, I did the same but with a blank subject line. Presumably, this passes the pdf along unchanged. Now, the Kindle Scribe both opened the file and allowed annotations. Success!!! Or not. Like notebooks, annotations apparently are stored as individual strokes in some proprietary Sqlite3 db. And apparently, there also is no mechanism to export them without going through the same email process as for a notebook. Unsurprising, but annoying. You’d figure Amazon at least would allow modified PDFs to be saved as … well … PDFs.

Put simply, Amazon appears to have made it impossible to securely or locally export notebooks or annotations. The writing feature therefore remains largely unusable by anyone with a sensible regard for privacy or security. The year is 2022, and the best solution is to photograph the notebook pages and xfer the photos to your computer. Yay technology! If only I had to send the photos to be developed and then scanned, my jubilation would be complete.

Given the house of horrors which is AWS, the ridiculosity surrounding Kindle Scribe Notebooks should come as no surprise. After all, building your own computer from vacuum tubes and then writing an OS for it is easier than setting up an AWS instance. Sadly, the institutional maladies which plague AWS (and Amazon author services, for anyone unfortunate enough to use those), seem to have spread to one of their few functional and innovative divisions.

“But hey — aren’t you being a bit harsh?” you may ask. “The thing just came out. Give them some time to fix the bugs.”

For anyone hoping that firmware updates will “solve” these problems, don’t count on it. Even for those issues with enormous user support, Amazon has been notoriously slow to provide obvious fixes, and this one is a niche problem that the typical “privacy? what’s privacy?” post-gen-x’er couldn’t care less about. Not to mention, Amazon just laid off much of their Kindle staff. If they followed the standard American corporate playbook, this included all the people who actually knew what they were doing.

Right now I’m debating whether to return the Amazon Scribe. Its size may be useful enough even without the writing features, but I doubt it. There’s also the remote possibility that Amazon will provide local export functionality.  But I’m not holding my breath.


“The Tale of Rin” serialization is live!

Update (8/8/22): I’ve ditched Vella. The Tale of Rin now exclusively is available on Substack. I’ve amended the post below to reflect this.

My epic fantasy series “The Tale of Rin” now is being serialized on Substack. New episodes come out on Wednesdays and Sundays. Here’s the description:

Just because Rin is indestructible doesn’t mean she can’t be hurt. On her quest to remedy an ancient sin, a single act of casual cruelty sets off an avalanche of events which threaten to destroy everything. Rin must rein in her assistant, a man of fierce attachment and questionable conviction, while avoiding her devious ex-husband, who will stop at nothing to reclaim her. In the balance lies her heart and the fate of the world.

Of the anticipated 6 volumes in the series, the first 2.5 have been written (and the rest mapped out). The first is publication ready and the 2nd close to that state. The first volume, “Protege”, likely will serialize to around 70-80 “episodes”.

The first 10 episodes will be free on Substack. Paid subscriptions are $5 per month (approximately 8 episodes) or $30 per year. Frankly, I would have preferred to charge less, but $5 is the lowest Substack allows. Note that a paid subscription gives you access to previous episodes as well — so don’t hesitate to subscribe at any point!

This is my first serialized novel, so please make some allowance for a few hiccups early on. If you encounter formatting errors or other weirdness please let me know. Such things aren’t always evident to the author, and these platforms are really clunky and buggy to post to.

Disclaimer: This is a work of original fiction, and any resemblance to real people or other fiction is purely coincidental.

Sensitivity Warning: This work may not be appropriate for readers highly sensitive to violence (including the occasional implication of possible sexual violence). Reader discretion is advised. Based on beta reader feedback, I’ve toned down a couple of scenes to make it more palatable to a broad audience. If you’re ok with something like Game of Thrones you should be fine with this.

I hope you enjoy the world I create and have a thrilling journey through it. Whether or not you find Rin an amiable companion, I’m sure you will find her an interesting one.

The Most Important Problem in the World — Solved!

Toilet Seat Problem

Should the toilet seat be left up or down after use?   I decided to undertake a rigorous investigation of this critical issue.   Every woman I asked knew the answer: leave it down.  Every married man also knew the answer:  do what she says.   The truly domesticated men did one better:  leave the cover down too.  I stood in awe of their sheer servility.

Sadly, nobody seemed to feel any need to investigate further.   But, like Galileo, Einstein, Anthony Bourdain, and Stalin I was not  deterred by the disapproval of my peers.   In fact, it invigorated me.   Anything worth doing is worth doing wrong, I say.

Given a man and woman living together, the goal was simple:  introduce an unwanted technical innovation that was sure to drive the marriage apart.   Specifically, I sought a strategy which minimizes the average number of toilet seat flips per use of the bathroom, subject to the constraint that everyone sees the same average.

As with any straightforward problem I choose to examine, the universe immediately rewired its laws to require a lot of math.   Hence this important paper.  Too important for Nature, Science, or The Journal of Advanced Computational Potty Talk.   The only venue worthy of its publication is this very blog which, entirely by coincidence, I own.

Surprisingly, the result turns out to be neither trivial nor obvious.   It seems  the universe actually has something interesting to say about toilet seats and fairness, even if I do not.

Click on the link above or below to view the PDF paper.

Toilet Seat Problem

Two Countries

There once were two countries, A and B, and two kinds of people, purple people and green people. Each country had both purple people and green people.

In country A, the purple people were in charge. A small group of purple people were the gatekeepers of all things, the decision makers, the managers of life.

In country B, the green people were in charge. A small group of green people were the gatekeepers of all things, the decision makers, the managers of life.

The two countries shared a large border and a free one. By ancient treaty, no visas were required and no checkpoints marred the landscape. But almost nobody ever crossed the border. A nearly insurmountable range of peaks obstructed much of the length, and strong rapids made the remainder treacherous.

Though fundamentally different in nature and culture, the majority of the purple and green people did not mind one another. Many even cherished the differences, and friendly relations were by far the norm in both countries.

The two governments were exceptions.

The purple leaders of country A portrayed green people as primitive, dangerous, and unable to restrain their impulses, creatures to be feared and controlled. The green people sought to dominate and oppress them, they warned. Only through constant vigilance and zeal could such a dire threat be averted. Whether they believed these words or simply found them politically expedient is unclear.

The green leaders of country B portrayed purple people as arrogant, irrational, and immoral, individuals of loose character and dishonest nature. Such people sought to lead good folk astray and never should be allowed influence, never should be listened to, they warned. Only through constant vigilance and zeal could such a dire threat be averted. Whether they believed these words or simply found them politically expedient is unclear.

Most green and purple people in both countries meant well, or at least did not intend ill. But a few did as a few will do, and this was exacerbated by the rhetoric of each government.

Every time a purple person in country B was attacked, the leaders of country A pointed and exclaimed “See, we are right. We must protect purple people from the inexcusable barbarity of the green people.” But they held no power in country B and compensated with an excess of zeal in their own country. Small crimes were made big, a growing range of behavior was criminalized, penalties grew, initiatives to advance purple people in the face of obvious oppression were advanced, and the public was freshly informed of the omnipresent danger posed by green people.

Every time a green person in country B was persecuted, the leaders of country B pointed and exclaimed “See, we are right. We must protect green people from the hysterical lunacy of the purple people.” But they held no power in country A and compensated with an excess of zeal in their own country. Small crimes were made big, a growing range of behavior was criminalized, penalties grew, initiatives to suppress the influence of purple people in the face of their obvious irresponsibility were advanced, and the public was freshly informed of the omnipresent evil posed by purple people.

The green people in country A cringed whenever something happened in country B. The inevitable furor surely would land on their heads. An inquisition would follow, jobs would be lost, lives would be ruined, and the slightest misstep would destroy them.

The purple people in country B cringed whenever something happened in country A. The inevitable furor surely would land on their heads. Vilification would follow, new restrictions would be imposed, rights would be lost, lives would be ruined, and the hope of improvement would grow ever more distant.

The majority of purple people in country A were not particularly swayed by their government’s propaganda, but they did not repudiate it. Most did not understand the plight of their green fellow citizens. They dismissed green complaints as hyperbolic, arguing that their government meant well and any real impact on green people was minimal. Those who believed the truth dared not speak up, and the purple leaders grew ever more powerful. Soon the green people sat hands in laps, eyes down, afraid that the slightest gesture or word could be seen as a threat by those purples who made a business of seeing threats everywhere. A few green sycophants found some small degree of success, but even they were not safe.

The majority of green people in country B were not particularly swayed by their government’s propaganda, but they did not repudiate it. Most did not understand the plight of their purple fellow citizens. They dismissed purple complaints as hysterical, arguing that their government meant well and any real impact on purple people was minimal. Those who believed the truth dared not speak up, and the green leaders grew ever more powerful. Soon the purple people sat hands in laps, eyes down, afraid that the slightest gesture or word could be seen as a sin by those greens who made a business of seeing sins everywhere. A few purple collaborators found some small degree of success, but even they were not safe.

Through the vagaries of geopolitics, some families happened to span both countries. On the rare occasions when they spoke, neither side believed the other.

The purple people in country A did not believe the tales told by their relatives in country B. These were exaggerations spread by politicians, they declared. After all, they experienced no such thing. If anything, their lives were easier than before. A few, seeing the oppression of green people in their own country (but unwilling to speak up about it) even rebuked their relatives. If anything, green people were the oppressed not the oppressors. It was one thing not to help them, but quite another to blame them.

The green people in country B did not believe the tales told by their relatives in country A. These were exaggerations spread by politicians, they declared. After all, they experienced no such thing. If anything, their lives were easier than before. A few, seeing the oppression of purple people in their own country (but unwilling to speak up about it) even rebuked their relatives. If anything, purple people were the oppressed not the oppressors. It was one thing not to help them, but quite another to blame them.

In this way, country A raced toward a dystopia for its green citizens and country B raced toward a dystopia for its purple citizens, yet nobody else recognized this.

Each government was the other’s best friend, and both were the people’s worst enemy.

This is how half the population did not realize the sky was falling, while the other half saw it happening with their own eyes.

But I apologize. I misspoke. The border has no mountains or rapids. It is not physical or legal, but one of social milieu, profession, and education. Yet it is no less real for this lack of topography. Despite the apparent freedom to do so, most people lack the wherewithal to cross the border.

The two countries are our country, today.

This is where we are, this is where we are going, and this is why you will not be believed if you say so.

What happens when you iterate Bayesian Inference with the same data set?

I’ve recently been reviewing Bayesian networks with an eye to learning STAN. One question which occurred to me is the following. Suppose we are interested in the probability distribution P(\mu) over parameters \mu\in X (with state space X). We acquire some data D, and wish to use it to infer P(\mu). Note that D refers to the specific realized data, not the event space from which it is drawn.

Let’s assume that (1) we have a prior P(\mu), (2) the likelihood P(D|\mu) is easy to compute or sample, and (3) the normalization P(D)\equiv \sum_{\mu\in X} P(D|\mu)P(\mu) is not expensive to compute or adequately approximate.

The usual Bayesian approach involves updating the prior to a posterior via Bayes’ thm: P(\mu|D)= \frac{P(D|\mu)P(\mu)}{P(D)}. However, there also is another view we may take. We need not restrict ourselves to a single Bayesian update. It is perfectly reasonable to ask whether multiple updates using the same D would yield a more useful result.

Such a tactic is not as ridiculous or unjustified as it first seems. In many cases, the Bayesian posterior is highly sensitive to a somewhat arbitrary choice of prior P(\mu). The latter frequently is dictated by practical considerations rather than arising naturally from the problem at hand. For example, we often use the likelihood function’s conjugate prior to ensure that the posterior will be of the same family. Even in this case, the posterior depends heavily on the precise choice of P(\mu).

Though we must be careful interpreting the results, there very well may be applications in which an iterated approach is preferable. For example, it is conceivable that multiple iterations could dilute the dependence on P(\mu), emphasizing the role of D instead. We can seek inspiration in the stationary distributions of Markov chains, where the choice of initial distribution becomes irrelevant. As a friend of mine likes to say before demolishing me at chess: let’s see where this takes us. Spoiler: infinite iteration “takes us” to maximum-likelihood selection.

An iterated approach does not violate any laws of probability. Bayes’ thm is based on the defining property P(\mu,D)= P(D|\mu)P(\mu)= P(\mu|D)P(D). Our method is conceptually equivalent to performing successive experiments which happen to produce the same data D each time, reinforcing our certainty around it. Although its genesis is different, the calculation is the same. I.e., any inconsistency or inapplicability must arise through interpretation rather than calculation. The results of an iterated calculation may be inappropriate for certain purposes (such as estimating error bars, etc), but could prove useful for others.

In fact, one could argue there only are two legitimate approaches when presented with a one-time data set D. We could apply it once or an infinite number of times. Anything else would amount to an arbitrary choice of the number of iterations.

It is easy to analyze the infinite iteration process. For simplicity, we’ll consider the case of a discrete, finite state space X. D is a fixed set of data values for our problem, so we are not concerned with the space or distribution from which it is drawn. P(D) is a derived normalization factor, nothing more.

Let’s introduce some notation:

– Let n\equiv |X|, and denote the elements of X by \mu_1\dots \mu_n.
– We could use n-vectors to denote probability or conditional probability distributions over X (with the i^{th} component the probability of \mu_i), but it turns out to be simpler to use diagonal n\times n matrices.
P(\mu) is an n-vector, which we’ll write as a diagonal n\times n matrix v with v_{ii}\equiv P(\mu_i).
– We’ll denote by D^k the data set D repeated k times. I.e., the equivalent of having performed an experiment k times and obtained D each time.
P(\mu|D) is an n-vector, which we’ll write as a diagonal n\times n matrix v' with v'_{ii}\equiv P(\mu_i|D)).
– Where multiple updates are involved, we denote the final posterior P(\mu|D^k) via an n\times n diagonal matrix v^{(k)}, with v^{(k)}_{ii}\equiv P(\mu_i|D^k). Note that v'= v^{(1)} and v= v^{(0)}.
P(D|\mu) as an n-vector of probabilities as well, but we’ll also treat it as a diagonal n\times n matrix M with M_{ii}\equiv P(D|\mu_i).
P(D)=\sum_{i=1}^n P(D|\mu_i)P(\mu_i) is a scalar. In our notation, P(D)= \text{tr}~ M v.

A single Bayesian update takes the form v'= M v/(\text{tr}~ M v). What happens if we repeat this? A second iteration substitutes v' for v, and we get v^{(2)}= M v'/(\text{tr}~ M v'). This is homogeneous of degree 0 in v', so the (\text{tr}~ M v) normalization factor in v' disappears. We thus have v^{(2)}= M^2 v /(\text{tr}~ M^2 v). The same reasoning extends to v^{(k)}= M^k v/(\text{tr}~ M^k v).

It now is easy to see what is happening. Suppose n=2, and let M_{11}>M_{22}. Our expression for P(\mu_1|D) after k iterations is v^{(k)}_1= \frac{M^k_{11} v_{11}}{M^k_{11} v_{11} + M^k_{22} v_{22}}.

This has the form \frac{a^k x}{a^k x + b^k y}, which can be written 1/(1+\frac{b^k y}{a^k x}). We know that b<a, so as long as x\ne 0 we have \lim_{k\rightarrow\infty} \frac{b^k y}{a^k x}= 0. Specifically, for \epsilon>0 we have \frac{b^k y}{a^k x}<\epsilon for k>\frac{\ln\epsilon + \ln \frac{x}{y}}{\ln \frac{b}{a}}. Note that the denominator is negative since a>b and the numerator is negative for small enough \epsilon.

We therefore have shown that (in this simple case), \lim_{k\rightarrow\infty} v^{(k)}_1= v_{11}. If we perform the same analysis for v^{(k)}_2, we get v^{(k)}_2= \frac{M^k_{22} v_{22}}{M^k_{11} v_{11} + M^k_{22} v_{22}}, which corresponds to 1/(1+\frac{a^k x}{b^k y}). The denominator diverges for large enough k, and the limit is 0. We therefore see that \lim_{k\rightarrow\infty} v^{(k)}_2= 0.

This trivially extends to n>2. As k\rightarrow\infty, all but the dominant M_{ii} are exponentially suppressed. The net effect of infinite iteration is to pick out the maximum likelihood value. I.e., we select the \mu_i corresponding to the maximum M_{ii}. All posterior probability is concentrated in that. Put another way, the limit of iterated posteriors is P(\mu_i|D^\infty)= 1 for i=argmax~P(D|\mu_i) and 0 for all others.

What if the maximum M_{ii} is degenerate? Let’s again consider the simple n=2 case, but now with M_{11}= M_{22}>0. It is easy to see what happens in this case. a/b=1, so v^{(k)}_1= \frac{v_{11}}{v_{11}+v_{22}} and v^{(k)}_2= \frac{v_{22}}{v_{11}+v_{22}}. Note that v_{11}+v_{22}=1 here, but we stated the denominator explicitly to facilitate visualization of the extension to n>2.

This extension is straightforward. We pick out the maximum likelihood values \mu_i, and they are assigned their prior probabilities, renormalized. Suppose there are m\le n degenerate maximum M_{ii}‘s, with indices i_1\dots i_m (each i_j\in 1\dots n). The limit of iterated posteriors P(\mu_{i_j}|D^\infty)= \frac{P(\mu_i)}{\sum_{j=1}^m P(\mu_{i_j})}. This reduces to our previous result when m=1.

Note that we must ensure v_i\ne 0 for the maximum likelihood \mu_i‘s. I.e., we cannot have a 0 prior for any of the maximum likelihood values. If we wish to exclude \mu_i‘s from consideration, we should do so before the calculation, thus eliminating the corresponding P(D|\mu_i)‘s from contention for the maximum likelihood.

Expanding |X| to a countable set poses no problem. In the continuous case, we must work with intervals (or measurable sets) rather than point values. For any \epsilon>0 and any set of nonzero measure containing all the maximum likelihood values, there will be some k that concentrates all but \epsilon of the posterior probability in that set.

Note that k depends on the choice of measurable set, and care must be taken when considering limits of such sets. For example, let p\equiv \max_{\mu} P(D|\mu) be the maximum likelihood probability. If we consider an interval I\equiv (p-\delta/2,p+\delta/2) as our maximum likelihood set, then the maximum likelihood “value” is the (measurable) set V\equiv P(D|\mu)^{-1}(I). For any \epsilon, we have a k as discussed above, such that P(\mu\notin V|D^j)<\epsilon for j>k. However, for a fixed \epsilon, that k will vary with \delta. Put another way, we cannot simply assume uniform convergence.

We can view infinite iteration as a modification of the prior. Specifically, it is tantamount to pruning the prior of all non-maximum-likelihood values and renormalizing it accordingly. The posterior then is equal to the prior under subsequent single-D steps (i.e. it is a fixed point distribution). Alternatively, we can view the whole operation as a single D^\infty update. In that case, we keep the original prior and view the posterior as the aforementioned pruned version of the prior.

There are two takeaways here:

1. The infinite iteration approach simply amounts to maximum-likelihood selection. It selects the maximum likelihood value(s) from the known P(D|\mu) and maintains their relative prior probabilities, suitably renormalized. Equivalently, it prunes all the non-maximum-likelihood values.
2. The resulting posterior still depends on the choice of prior unless the maximum likelihood value is unique, in which case that value has probability 1.

Unlike stationary distributions of Markov chains, the result is not guaranteed to be independent of our arbitrary initial choice — in this case, the prior P(\mu). Though true independence only is achieved when there is a unique maximum likelihood value, the dependence is reduced significantly even when there is not. The posterior depends only on those prior values corresponding to maximum likelihood \mu‘s. All others are irrelevant. The maximum likelihood values typically form a tiny subset of \mu‘s, thus eliminating most dependence on the prior. Note that such degeneracy (as well as the values themselves) is solely determined by the likelihood function.

Trash Talk

These days, every minor institutional faux pas draws a melodramatic fawning apology utterly devoid of a modicum of self-respect and expressed through the metallic insincerity of boilerplate buzzwords . By now, one or more generations have grown up bombarded with such nonsense. We  only can imagine what they must be like at home…

Hey Bob, listen, it’s no big deal, but could you take out the trash when it’s your turn? It’s really been piling up.

I’ve heard you loud and clear.


My top priority has been fostering a community which values inclusiveness, mutual respect, and constructive engagement. A place where all perspectives, values, diverse viewpoints, and lifestyles are cherished.

Um, ok. Sure.

I realize I’ve fallen far short of my high ideals in this regard, and promise to do better.

Great. So … you’ll take out the trash?

But it’s not enough to be sorry. I know there has been an inexcusable breach of trust, and that my actions have caused deep hurt and lasting anguish.

If you feel bad, you could, like, take out the trash.

I can do better. I will not be complacent in the face of such a challenge. This is an opportunity for reflection and learning, to grow into a better version of myself.

Really, it’s not that big a deal. You just take the trash and put it in the bin.

Change is necessary, and the first step toward such change is to understand the scope of the problem.

That’s easy. The schedule is on the fridge. Just, you know, do it.

Toward this end, I have identified several important steps.

There’s really just one.

First, I enrolled in a sixteen-week sensitivity training course, mandatory for me, myself, and I.

Is that the reason you didn’t do any other chores for the last sixteen weeks?

I also hired an outside firm to thoroughly investigate my past behaviors and recommend a path forward. You may have noticed them here and there recently.

You mean that guy who crashed on the couch and ate all my Doritos? I thought he was a friend of yours.

After a rigorous investigation, we have concluded that all policies and procedures were followed and there was no misconduct.

You’re not going to take out the trash, are you?

The repercussions of trashgate are ongoing, and I will not rest on my laurels. I can do better, and I will do better.

Can part of your “not resting” involve moving trash from the kitchen to the bin?

That I did not intend my actions to be offensive is no excuse for the anxiety and pain they have caused.

It doesn’t smell great, and can attract roaches.

Nor do those actions reflect who I am as a person.

Pretty sure they do.

However, in the face of the continuing public reaction, my involvement can only serve to distract from our community’s valuable mission.

I think I know where this is going.

In consultation with myself, I have concluded that the best way for us all to move forward is for me to step down from my trash removal responsibilities.

You know, you could have just refused up front.

Although my formal role has diminished, I will remain active in other aspects of our vibrant and innovative community.

In other words, you’ll continue to use the foosball table.

I only hope these steps can bring some small measure of closure to those who have suffered through my thoughtless actions.

The only closure we need is of the trash bin.

How to Produce a Beautiful Book from the Command Line

Book Production Framework and Examples on GitHub


Over the last couple of years, a number of people have asked me how I produce my books.  Most self-published (excuse me, ‘indie-published’) books have an amateurish quality that is easy to spot, and the lack of attention to detail detracts from the reading experience.  Skimping on cover art can be a culprit, but it rarely bears sole blame — or even the majority of it.   Indie-published interiors often are sloppy, even in books with well-designed covers.  For some reason, many authors give scant attention to the interior layout of their books.  Of course, professional publishers know better.    People judge books not just by their covers, but by their interiors as well.  If the visual appeal of your book does not concern you, then read no further.  Your audience most likely will not.

Producing a visually-pleasing book is not an insurmountable problem for the indie-publisher, nor a particularly difficult one.  It just requires a bit of attention.  Even subject to the constraints of print-on-demand publishing, it is quite possible to produce beautiful looking books.  Ebooks prove more challenging because one has less control over them (due to the need for reflowable text), but it is possible to do as well as the major publishers by using some of their tricks.  Moreover, all this can be accomplished from the command-line and without the use of proprietary software.

Now that I’ve finished my fifth book of fiction (and second novel), I figure it’s a good time to describe how I produce my books. I have automated almost the entire process of book and ebook production from the command-line. My process uses only free, open-source software that is well-established, well-documented, and well-maintained.

Though I use Linux, the same toolchain could be employed on a Mac or Windows box with a tiny bit of adaptation. To my knowledge, all the tools I use (or obvious counterparts) are available on both those platforms. In fact, MacOS is built on a flavor of unix, and the tools can be installed via Homebrew or other methods. Windows now has a unix subsystem which allows command-line access as well.

I have made available a full implementation of the system for both novels and collections of poetry, stories, or flash-fiction.   Though I discuss some general aspects below, most of the nitty gritty appears in the github project’s README file and in the in-code documentation.   The code is easily adaptable, and you should not feel constrained to the design choices I made.  The framework is intended as a proof of concept (though I use it regularly myself), and should serve as a point of departure for your own variant.  If you encounter any bugs or have any questions, I encourage you to get in touch.  I will do my best to address them in a timely fashion.


First, let’s see some examples of output (unfortunately, wordpress does not allow epub uploads, but you can generate epubs from the repository and view them in something like Sigil).  The novel and collection pdfs are best viewed in dual-page mode since they have a notion of recto and verso pages.

Who would be interested in this

If you’re interested in producing a fiction book from the command-line, it is fair to assume that (1) you’re an author or aspiring author and (2) you’re at least somewhat conversant with shell and some simple scripting. For scripting, I use Python 3, but Perl, Ruby, or any comparable language would work. Even shell scripting could be used.

At the time of this writing, I have produced a total of six books (five fiction books and one mathematical monograph) and have helped friends produce several more. All the physical  versions were printed through Ingram, and the ebook versions were distributed on Amazon. Ingram is a major distributor as well, so the print versions also are sold through Amazon, Barnes & Noble, and can be ordered through other bookstores. In the past I used Smashwords to port and distribute the ebook through other platforms (Kobi, Barnes & Noble, etc), but frankly there isn’t much point these days unless someone (ex. Bookbub) demands it. We’re thankfully past the point where most agents and editors demand Word docs (though a few still do), but producing one for the purpose of submission is possible with a little adaptation using pandoc and docx templates. However, most people accept PDFs these days.

My books so far include two novels, three collections of poetry & flash-fiction, and a mathematical monograph.  I have three other fiction books in the immediate pipeline (another collection of flash-fiction, a short story collection, and a fantasy novel), and several others in various stages of writing.  I do not say this to toot my own horn, but to make clear that the method I describe is not speculative.  It is my active practice.

The main point of this post is to demonstrate that it  is quite possible to produce a beautiful literary book using command-line, open-source tools in a reproducible way.  The main point of the github project is to show you precisely how to do so.   In fact, not only can you produce a lovely book that way, but I would argue it is the best way to go about it! This is true whether your book is a novel or a collection of works.

One reason why such a demonstration is necessary is the dearth of online examples. There are plenty of coding and computer-science books produced from markdown via pandoc. There are plenty of gorgeous mathematics books produced using LaTeX.   But there are very few examples in the literary realm, despite the typesetting power of LaTeX, and the presence of the phenomenal Memoir LaTeX class for precisely this purpose.  This post is intended to fill that gap.

A couple of caveats.

Lest I oversell, here are a couple of caveats.

  • When I speak of an automated build process, I mean for the interiors of books. I hire artists to produce the covers. Though I have toyed with creating covers from the command-line in the past (and it is quite doable), there are reasons to prefer professional help. First, it allows artistic integration of other cover elements such as the title and author. Three of my books exhibit such integration, but I added those elements myself for the rest (mainly because I lacked the prescience to request them when I commissioned the art early on). I’ll let you guess which look better. The second big reason to use a professional artist comes down to appeal. The book cover is the first thing to grab a potential reader’s eye, and can make the sale. It also is a key determinant in whether your book looks amateurish or professional. I am no expert on cover design, and am far from skilled as an artist. A professional is much more likely to create an appealing cover. Of course, plenty of professionals do schlocky work, and I strongly advise putting in the effort and money to find a quality freelancer.  In my experience, it should cost anywhere from $300-800 in today’s dollars.  I’ve paid more and gotten less, and I’ve paid less and gotten more.   My best experiences were with artists who did not specialize in cover design.
  • The framework I provide on github is intended as a guide, not as pristine code for commercial use. I am not a master of any of the tools involved. I learned them to the extent necessary and no more. I make no representation that my code is elegant, and I wouldn’t be surprised if you could find better and simpler ways to accomplish the same things. This should encourage rather than discourage you from exploring my code. If I can do it, so can you. All you need is basic comfort with the command-line and some form of scripting. All the rest can be learned easily. I did not have to spend hundreds of hours learning Python, make, pandoc, and so on. I learned the basics, and googled whatever issues arose. It was quite feasible, and took a tiny fraction of the time involved in writing a novel.

The benefits of a command-line approach

If you’ve come this far, I expect that listing the benefits of a command-line approach is unnecessary. They are roughly the same as for any software project: stability, reproducibility, recovery, and easy maintenance. Source files are plain text, and we can bring to bear a huge suite of relevant tools.

A suggestion vis-a-vis code reuse

One suggestion: resist the urge to unify code. Centralizing scripts to avoid code duplication or creating a single “universal” script for all your books may be enticing propositions. I am sorely tempted to do so whenever I start a new project. My experience is that this wastes more time than it saves. Each project has unforeseeable idiosyncrasies which require adaptation, and changing centralized or universal scripts risks breaking backward compatibility with other projects. By having each book stand on its own, reproducibility is much easier, and we are free to customize the build process for a new book without fear of  unexpected consequences. It also is easier to encapsulate the complete project for timestamping and other purposes. It’s never pleasant to discover that a backup of your project is missing some dependency that you forgot to include.

A typical author produces new books or revises old ones infrequently. The ratio of time spent maintaining the publication machinery to writing and editing the book is relatively small. On average, it takes me around 500 hours to write and edit a 100,000 word novel, and around 100 hours for a 100 page collection of flash-fiction and poetry. Adapting the framework from my last book typically takes only a few hours, much of which is spent on adjustments to the cover art.

Even if porting the last book’s framework isn’t that time consuming, why trouble with it at all? Why not centralize common code? The problem is that this produces a dependency on code outside the project. If we change the relevant library or script, then we must worry about the reproducibility of all past books which depend on it. This is a headache.

Under other circumstances, my advice would be different. For example, a small press using this machinery to produce hundreds of books may benefit from code unification. The improved maintainability and time savings from code centralization would be significant. In that case, backward-compatibility issues would be dealt with in the same manner as for software: through regression tests. These could be strict (MD5 checksums) or soft (textual heuristics) depending on the toolchain and how precise the reproducibility must be. For example, non-visual changes such as an embedded date would alter the hash but not textual heuristics. The point is that this is doable, but would require stricter coding standards and carefully considered change-metrics.

The other reason to avoid code reuse is the need for flexibility. Unanticipated issues may arise with new projects (ex. unusually formatted poems), and your stylistic taste may change as well. You also may just want to mix things up a bit, so all your books don’t look the same. Copying the framework to a new book would be done a few times a year at most, and probably far less.

Again, if the situation is different my advice will be too. For example, a publisher producing books which vary only in a known set of layout parameters may benefit from a unified framework. Even in this case, it would be wise to wait until a number of books have been published, to see which elements need to be unified and which parameters vary book to book.


Here is a list of some tools I use. Most appear in the project but others serve more of a support function.

Core tools

  • pandoc: This is used to convert from markdown to epub and LaTeX. It is an extremely powerful conversion tool written in Haskell. It often requires some configuration to get things to work as desired, but it can do most of what we want.  And no, you do not need to know Haskell to use it.
  • make: The entire process is governed by a plain old Makefile. This allows complete reproducibility.
  • pdfLaTeX: The interior of the print book is compiled from LaTeX into a pdf file via pdfLaTeX. LaTeX affords us a great way to achieve near-total control over the layout. You need not know much LaTeX unless extensive changes to the interior layout are desired. The markdown source text is converted via pandoc to LaTeX through templates. These templates contain the relevant layout information.
  • memoir LaTeX class: This is the LaTeX class I use for everything. It is highly customizable, relatively easy to use, and ideally suited to book production. It has been around for a long time, is well-maintained, has a fantastic (albeit long) manual, and boasts a large user community. As with LaTeX, you need not learn its details unless customization of the book layout is desired.  Most simple things will be obvious from the templates I provide.

Essential Programs, but can be swapped with comparables

  • python3: I write my scripts in python 3, but any comparable scripting language will do.
  • aspell: This is the command-line spell-checker I use, but any other will do too. It helps if it has a markdown-recognition mode.
  • emacs: I use this as my text editor, but vim or any other text editor will do just fine. As long as it can output plain text files (ascii or unicode, though I personally stick to ascii) you are fine. I also use emacs org-mode for the organizational aspects of the project. One tweak I found very useful is to have the editor highlight anything in quotes. This makes conversation much easier to parse when editing.
  • pdftools (poppler-utils): Useful tools for splitting out pages of pdfs, etc. Used for ebook production. I use the pdfseparate utility, which allows extraction of a single page from a PDF file. Any comparable utility will work.

Useful Programs, but not essential

  • git: I use this for version control. Strictly speaking, version control isn’t needed. However, I highly recommended it. From a development standpoint, I treat writing as I do a software project. This has served me well. Any comparable tool (such as mercury) is fine too. Note that the needs of an author are relatively rudimentary. You probably won’t need branching or merging or rebasing or remote repos. Just “git init”, “git commit -a”, “git status”, “git log”, “git diff”, and maybe “git checkout” if you need access to an old version.
  • wdiff, color-diff: I find word diff and color-diff very useful for highlighting changes.
  • imagemagick: I use the “convert” tool for generating small images from the cover art. These can be used for the ebook cover or for advertising inserts in other books. “identify” also can be useful when examining image files.
  • pdftk (free version): Useful tools for producing booklets, etc. I don’t use it in this workflow, but felt it was worth mentioning.
  • ebook-convert: Calibre command-line tool for conversion. Pandoc is far better than calibre for most conversions, in my experience. However, ebook-convert can produce mobi and certain other ebook formats more easily.
  • sigil: This the only non-command-line tool listed, but it is open-source. Before you scoff and stop reading, let me point out that this is the aforementioned “almost” when it comes to automation. However, it is a minor exception. Sigil is not used for any manual intervention or editing. I simply load the epub which pandoc produces into sigil, click an option to generate the TOC, and then save it. The reason for this little ritual is that Amazon balks at the pandoc-produced TOC for some reason, but seems ok with Sigil’s. It is the same step for every ebook, and literally takes 1 minute. Unfortunately, sigil offers no command-line interface, and there is no other tool (to my knowledge) to do this. Sigil also is useful to visually examine the epub output if you wish. I find that it gives the most accurate rendering of epubs.
  • eog: I use this for viewing images, though any image viewer will do. It may be necessary to scale and crop (and perhaps color-adjust) images for use as book covers or interior images. imageMagick’s “identify” and “convert” commands are very useful for such adjustments, and eog lets me see the results.

How I write

All my files are plain text. I stick to ascii, but these days unicode is fine too. However, rich-text is not.  Things like italics and boldface are accomplished through markdown.

Originally, I wrote most of my pieces (poems, chapters, stories) in LaTeX, and had scripts which stitched them together into a book or produced them individually for drafts or submissions to magazines. These days, I do everything in markdown  — and a very simple form of markdown at that.

Why not just stick with LaTeX for the source files? It requires too much overhead and gets in the way. For mathematical writing, this overhead is a small price to pay, and the formatting is inextricably tied to the text. But for most fiction and poetry, it is not.

I adhere to the belief that separating format and content is a wise idea, and this has been borne out by my experience. Some inline formatting is inescapable (bold, italics, etc), and markdown is quite capable of accommodating this. On the rare occasions when more is needed (ex. a specially formatted poem), the markdown can be augmented with html or LaTeX directly as desired. Pandoc can handle all this and more. It is a very powerful program.

I still leave the heavy formatting (page layout, headers, footers, etc) to LaTeX, but it is concentrated in a few templates, rather than the text source files themselves.

There also is another reason to prefer markdown. From markdown, I more easily can generate epubs or other formats. Doing so from LaTeX is possible but more trouble than it’s worth (I say this from experience).

What all this means is that I can focus on writing. I produce clear, concise ascii files with minimal format information, and let my scripts build the book from these.

To see a concrete example, as well as all the scripts involved, check out the framework on github.

Book Production Framework and Examples on GitHub

Be Careful Interpreting Covid-19 Rapid Home Test Results

Now that Covid-19 rapid home tests are widely available, it is important to consider how to interpret their results. In particular, I’m going to address two common misconceptions.

To keep things grounded, let’s use some actual data. We’ll assume a false positive rate of 1% and a false negative rate of 35%. These numbers are consistent with a March, 2021 metastudy [1]. We’ll denote the false positive rate E_p=0.01 and the false negative rate E_n=0.35.

It may be tempting to assume from these numbers that a positive rapid covid test result means you’re 99% likely to be infected, and a negative result means you’re 65% likely not to be. Neither need be the case. In particular,

  1. A positive result does increase the probability you have Covid, but by how much depends on your previous prior. This in turn depends on how you are using the test. Are you just randomly testing yourself, or do you have some strong reason to believe you may be infected?

  2. A negative result has little practical impact on the probability you have Covid.

These may seem counterintuitive or downright contradictory. Nonetheless, both are true. They follow from Bayes’ thm.

Note that when I say that the test increases or decreases “the probability you have Covid,” I refer to knowledge not fact. You either have or do not have Covid, and taking the test obviously does not change this fact. The test simply changes your knowledge of it.

Also note that the limitations on inference I will describe do not detract from the general utility of such tests. Used correctly, they can be extremely valuable. Moreover, from a behavioral standpoint, even a modest non-infinitesimal probability of being infected may be enough to motivate medical review, further testing, or self-quarantine.

Let’s denote by C the event of having Covid, and by T the event of testing positive for it. P(C) is the prior probability of having covid. It is your pre-test estimate based on everything you know. For convenience, we’ll often use \mu to denote P(C).

If you have no information and are conducting a random test, then it may be reasonable to use the general local infection rate as P(C). If you have reason to believe yourself infected, a higher rate (such as the fraction of symptomatic people who test positive in your area) may be more suitable. P(C) should reflect the best information you have prior to taking the test.

The test adds information to your prior P(C), updating it to a posterior probability of infection P(C|O), where O denotes the outcome of the test: either T or \neg T.

In our notation, P(\neg T|C)= E_n and P(T|\neg C)= E_p. These numbers are properties of the test, independent of the individuals being tested. For example, the manufacturer could test 1000 swabs known to be infected with covid from a petri dish, and E_n would be the number which tested negative divided by 1000. Similarly, they could test 1000 clean swabs, and E_p would be the number which tested positive divided by 1000.

What we care about are the posterior probabilities: (1) the probability P(C|T) that you are infected given that you tested positive, and (2) the probability that you are not infected given that you tested negative P(\neg C|\neg T). I.e. the probabilities that the test correctly reflects your infection status.

Bayes’ Thm tells us that P(A|B)= \frac{P(B|A)P(A)}{P(B)}, a direct consequence of the fact that P(A|B)P(B)= P(B|A)P(A)= P(A\cap B).

If you test positive, what is the probability you have Covid? P(C|T)= \frac{P(T|C)P(C)}{P(T|C)P(C)+P(T|\neg C)P(\neg C)}, which is \frac{(1-E_n)\mu}{(1-E_n)\mu+E_p(1-\mu)}. The prior of infection was \mu, so you have improved your knowledge by a factor of \frac{(1-E_n)}{(1-E_n)\mu+E_p(1-\mu)}. For \mu small relative to E_p, this is approximately \frac{E_p}{1-E_n}.

Suppose you randomly tested yourself in MA. According to data from Johns Hopkins [2], at the time of this writing there have been around 48,000 new cases reported in MA over the last 28 days. MA has a population of around 7,000,000. It is reasonable to assume that the actual case rate is twice that reported (in the early days of Covid, the unreported factor was much higher, but let’s assume it presently is only 1\times).

Le’ts also assume that any given case tests positive for 14 days. I.e., 24,000 of those cases would test positive at any given time in the 4 week period (of course, not all fit neatly into the 28 day window, but if we assume similar rates before and after, this approach is fine). Including the unreported cases, we then have 48,000 active cases at any given time. We thus have a state-wide infection rate of \frac{48000}{7000000}\approx 0.00685, or about 0.7%. We will define \mu_{MA}\equiv 0.00685.

Using this prior, a positive test means you are 45\times more likely to be infected post-test than pre-test. This seems significant! Unfortunately, the actual probability is P(C|T)= 0.31.

This may seem terribly counterintuitive. After all, the test had a 1% false positive rate. Shouldn’t you be 99% certain you have Covid if you test positive? Well, suppose a million people take the test. With a 0.00685 unconditional probability of infection, we expect 6850 of those people to be infected. E_n=0.35, so only 4453 of those will test positive.

However, even with a tiny false positive rate of E_p=0.01, 9932 people who are not infected also will test positive. The problem is that there are so many more uninfected people being tested that E_p=0.01 still generates lots of false positives. If you test positive, you could be in the 9932 people or the 4453 people. Your probability of being infected is \frac{4453}{9932+4453}= 0.31.

Returning to the general case, suppose you test negative. What is the probability you do not have Covid? P(\neg C|\neg T)= \frac{P(\neg T|\neg C)P(\neg C)}{P(\neg T|\neg C)P(\neg C)+P(\neg T|C)P(C)}= \frac{(1-E_p)(1-\mu)}{(1-E_p)(1-\mu)+E_n\mu}. For small \mu this is approximately 1 unless E_p is very close to 1. Specifically, it expands to 1-\frac{E_n}{(1-E_p)}\mu+O(\mu^2).

Under \mu_{MA} as the prior, the probability of being uninfected post-test is 0.99757 vs 0.9932 pre-test. For all practical purposes, our knowledge has not improved.

This too may seem counterintuitive. As an analogy, suppose in some fictional land earthquakes are very rare. Half of them are preceded by a strong tremor the day before (and such a tremor always heralds a coming earthquake), but the other half are unheralded.

If you feel a strong tremor, then you know with certainty than an earthquake is coming the next day. Suppose you don’t feel a strong tremor. Does that mean you should be more confident that an earthquake won’t hit the next day? Not really. Your chance of an earthquake has not decreased by a factor of two. Earthquakes were very rare to begin with, so the default prediction that there wouldn’t be one only is marginally changed by the absence of a tremor the day before.

Of course, \mu_{MA} generally is not the correct prior to use. If you take the test randomly or for no particular reason, then your local version of \mu_{MA} may be suitable. However, if you have a reason to take the test then your \mu is likely to be much higher.

Graphs 1 and 2 below illustrate the information introduced by a positive or negative test result as a function of the choice of prior. In each, the difference in probability is the distance between the posterior and prior graphs. The prior obviously is a straight line since we are plotting it against itself (as the x-axis). Note that graph 1 has an abbreviated x-axis because P(C|T) plateaus quickly.

From graph 1, it is clear that except for small priors (such as the general infection rate in an area with very low incidence), a positive result adds a lot of information. For \mu>0.05, it provides near certainty of infection.

From graph 2, we see that a negative result never adds terribly much information. When the prior is 1 or 0, we already know the answer, and the Bayesian update does nothing. The largest gain is a little over 0.2, but that’s only attained when the prior is quite high. In fact, there’s not much improvement at all until the prior is over 0.1. If you’re 10% sure you already have covid, a home test will help but you probably should see a doctor anyway.

Note that these considerations are less applicable to PCR tests, which can have sufficiently small E_p and E_n to result in near-perfect information for any realistic prior.

One last point should be addressed. How can tests with manufacturer-specific false positive and false negative rates depend on your initial guess at your infection probability? If you pick an unconditional local infection rate as your prior, how could they depend on the choice of locale (such as MA in our example)? That seems to make no sense. What if we use a smaller locale or a bigger one?

The answer is that the outcome of the test does not depend on such things. It is a chemical test being performed on a particular sample from a particular person. Like any other experiment, it yields a piece of data. The difference arises in what use we make of that data. Bayesian probability tells us how to incorporate the information into our previous knowledge, converting a prior to a posterior. This depends on that knowledge — i.e. the prior. How we interpret the result depends on our assumptions.

A couple of caveats to our analysis:

  1. The irrelevance of a negative result only applies when you have no prior information other than some (low) general infection rate. If you do have symptoms or have recently been exposed or have any other reason to employ a higher prior probability of infection, then a negative result can convey significantly more information. Our dismissal of its worth was contingent on a very low prior.

  2. Even in the presence of a very low prior probability of infection, general testing of students or other individuals is not without value. Our discussion applies only to the interpretation of an individual test result. In aggregate, the use of such tests still would produce a reasonable amount of information. Even if only a few positive cases are caught as a result and the overall exposure rate is lowered only a little, the effect can be substantial. Pathogen propagation is a highly nonlinear process, and a small change in one of the parameters can have a very large effect. One caution, however. If the results aren’t understood for what they are, overconfidence can result. The aggregate use of testing can have a substantial negative effect if it results in relaxation of other precautions due to overconfidence resulting from a misunderstanding of the information content of those test results.


[1] Rapid, point‐of‐care antigen and molecular‐based tests for diagnosis of SARS‐CoV‐2 infection — Dinnes, et al. Note that “specificity” refers to 1-E_p and “sensitivity” refers to 1-E_n. See wikipedia for further details

[2] Johns Hopkins Covid-19 Dashboard

“The Delivery” now is fully available!

Great news! My new short novel, The Delivery, is available in both print and for Kindle, both in the US and internationally!

This book has been some time in the making, and I hope people enjoy it.  Her is a brief description:

The Delivery is what happens when Kafka meets Monty Python.  Wilbur is an unassuming little man living an unassuming little life. He and his wife have a stereotypical 1950s existence, but in modern America. One day, he arrives home to discover a mysterious crate. His attempts to deal with a seemingly minor mistake lead to an escalating series of absurdities, straining his marriage, leaving the couple’s lives in tatters, and leading him to question his place in the world. Do millions perish? Does the world end? Does Wilbur figure out how to make photocopies?

You can buy it at the following locations:

NOTE on UK orders: Amazon says the print edition is unavailable in the UK but that’s incorrect. You can order it from them. If you enter a mail code, it will give you a time-frame. This said, it may take a few weeks for it to arrive.

NOTE on look-inside: For some reason known only to Jeff Bezos, Amazon can’t get the kindle look-inside working (they always have problems with that). However, the print version look-inside works fine, so just view that if you want to see what the book looks like.

Fun with Voting in Cambridge, MA

My city of Cambridge, MA is one of a few municipalities which employs ranked choice voting for City Council elections. Unlike most cities, the Mayor is chosen by the City Council and is largely a ceremonial position. Most real power resides with the City Manager, who is appointed for an indefinite term by the City Council. This means that City Councils which get to appoint a new City Manager exert an inordinate influence over the future course of the city. One such point is fast approaching. Unfortunately, given the present and probable near-term composition of the City Council, the decision likely will be based on considerations other than aptitude. However, putting aside my city’s somber prognosis, the upcoming City Council election is a good opportunity to discuss an unusual method of voting and some of its shortcomings.

Ordinary winner-takes-all elections dominate the popular consciousness. National elections are of this nature. It would not be inaccurate to observe that such an approach reflects the general weltanschauung of our culture. However, there are many other voting methods. In fact, voting theory is a vibrant field of research. Together with its sibling, auction theory, it forms part of the subject commonly known as “social choice theory”.

As an aside, I recently published a paper, Social Choice using Moral Metrics in that field. It focuses on measuring distances between behaviors, rather than on voting systems per se. Back in 2008, I also wrote a voting theory piece about swing votes and block voting. What I termed “influence” in it is more commonly referred to as “voting power”. Neither are related to what I discuss in this post, but I encourage the interested reader to peruse them.

It may be argued that certain voting methods are fairer than others, by one or another definition of fairness. Particular flavors sometimes are advocated by those disenchanted with an existing method or an agenda to see some particular group gain influence.  Calls for change sometimes arise in response to highly-visible anomalies, election outcomes which appear egregiously unfair even to disinterested eyes.

In elections with a large field of candidates or those in which a number of positions are simultaneously filled (such as the Cambridge City Council election), winner-takes-all voting may not be suitable or may give rise to such anomalies.

California’s recall system is an example. The ballot in that case has 2 questions: (1) whether to recall the governor and (2) who should replace him. The first question is winner-takes-all for the governor alone. If he loses, the 2nd question is winner-takes-all for the other candidates. It is quite possible for a candidate to be chosen who easily would have lost to the recalled governor one-on-one. In 2003, 44.6% of voters voted not to recall Governor Davis. He thus was recalled, and Schwarzenegger then won with 48.58% of the votes for replacement. It is highly unlikely that in a head-to-head gubernatorial election, Republican Schwarzenegger would have beaten Democrat Davis in the heavily blue state. However, Gray was excluded from this 2nd contest and Schwarzenegger was deemed preferable to the alternatives by most voters.

Arrow’s Theorem

It is natural to ask whether any voting system is unimpeachably fair, indicting the use of other systems as anachronistic or disingenuous. Arrow famously proved that, under even a small set of fairness constraints and for a broad class of voting systems, it is impossible to find one. Loosely speaking, when more than 2 candidates are present, no method of aggregating the rankings of candidates by voters into a single outcome ranking can simultaneously satisfy three conditions: (1) if every voter prefers candidate x to candidate y, then x outranks y in the outcome, (2) no single voter’s preference determines the outcome (i.e. no dictator), and (3) if each voter ranks x relative to y (i.e. above or below it) the same way in elections A and B (though the order can differ between voters, of course), then the outcomes of A and B do too. I.e., if voters change their overall ranking of x and y or the relative placement of other candidates, but don’t change whether x is preferred to y or vice versa, then whether x outranks y or vice versa in the outcome is unchanged.

It is quite plausible to add more fairness conditions, but most plausible definitions of fairness would require at least these three conditions to hold. Arrow showed that there is no ranked voting system (including “preponderance of the votes”) in which unfair anomalies cannot arise.

As an aside, if one were to relax a condition, the most palatable clearly would be (3). It is conceivable that a “fair” aggregation method may allow the overall ranking of candidates to affect a pairwise order in the outcome. However, this generally is deemed undesirable.

As with complexity results in computer science (CS) or Godel’s impossibility theorem in logic, the theoretical existence of hard or problematic cases does not necessarily pose a practical obstacle. In CS, an algorithm with worst-case exponential complexity may be far more useful than one with linear complexity in real-world applications. For example, the latter could have a huge constant cost (often referred to as a “galactic algorithm”) and the former could be exponential only in an infinitesimal fraction of cases or under circumstances which never arise in practice. Godel’s theorem does have real-world examples (i.e. non-meta-theorems), but (at this point) they remain rare.

Though nowhere near as profound,  Arrow’s theorem invites similar skepticism.  The impossibility of a preference system which excludes all anomalies does not mean such anomalies arise in practice, or that a system which excludes all realistic anomalies cannot be found.   Unfortunately (or fortunately, depending on one’s perspective), such anomalies do arise in practice.  Worse,  the systems in question often are of significant social import and subject to intense scrutiny.  The anomalies which do arise can be quite visible and politically troublesome.

Social choice theory exhibits another critical difference from CS and logic, one which merits additional caution.  The goal of logic, mathematics, and theoretical computer science generally is to understand which problems are solvable and how best to solve them.  Anomalies are viewed as pathological and undesirable.  They sometimes serve as useful counterexamples, guiding researchers to better understanding and helping them improve their tools.   However, they are to be avoided in real-world applications.   If a pathological case arises in such a context, alternate machinery must be employed or the framework modified to exclude it.

This need not be the case in social choice theory.   Not everyone’s goal is aligned, or social choice would be unnecessary.    With elections, there could be adverse incentives. It may be possible to game an election by identifying and exploiting anomalies endemic to the specific system involved.  There also may be groups who strongly prefer that anomalies arise, either for purposes of fomenting discord or if those anomalies serve them well.  For this reason, dismissing anomalies as almost impossible under some assumed prior may be naive. The prior must incorporate human behavior, and this very well could concentrate probability around the anomalies.  Put another way, if we naively model the probability of anomalies arising using an assumption of ideal behavior we risk ignoring the very real possibility that participants will engineer or utilize anomalies.

This issue is related to Gibbard’s theorem, which loosely states that under even weaker conditions than Arrow’s theorem (at least 3 candidates and no dictator), there is no ideal ballot which reflects a voter’s preferences. Put another way, the voting system can be gamed. In fact, a voter may need to game it (perhaps in response to polls or other information) in order to best reflect their individual preferences. The optimal ballot ranking to enact a voter’s preferences may not be their actual preference ranking of candidates.

The Rules in Cambridge

What does all this have to do with the Cambridge elections? Cambridge employs a particular system of ranked choice voting, which they refer to as “Proportional Representation”. This often is portrayed as fairer, more democratic, and so on. I am going to offer an example of an egregious anomaly which can result. I do this not in the expectation that it will arise or be exploited.  Nor do I hope to change a voting method that is, all things considered, quite reasonable.  Rather, the anomaly serves an illustrative example of the inherent problem with claiming that one voting system is “fairer” than another.

First, I’ll describe the precise rules of the Cambridge election, as best I understand them. See MA Election Laws, section 9 for details.  State law governs the general rules for proportional representation voting in any Massachusetts municipalities which choose to employ it.  Only certain parameters and details of execution are left to local discretion.

The City Council consists of 9 individuals, and the entire body is elected once every 2 years. Voters are presented with a list of candidates and may select a 1st choice, a 2nd choice, and so on.  I do not recall the maximum number of choices which can be made, but let us suppose it is not limited. The anomaly arises whether or not this is the case. Note that a given voter is not required to rank all the candidates. They could select only their top 3 choices, for example. Whether or not a full ranking by each voter is required does not affect the anomaly.

First some definitions. N will denote the total number of ballots (i.e. the number of voters who participate in the election).  At the time of writing, the minimum number of signatures to get on the ballot is 50.  We’ll call this ‘M’, because State law gives it a role in the algorithm. Q=(N/10)+1 will be the “quota”, the minimum number of ballots a candidate needs to win.

Why not choose Q=N/9?  The type of voting system we’re describing is sometimes referred to as “single-transferable-vote” (STV) because of the use of spillovers (described below). There are two common quota methods for determining STV winners:  (1) “Hare” corresponds to Q=N/9, and (2) “Droop” corresponds to Q=(N/10)+1.   In each case, we round up if needed. The two methods generally result in the same outcome or differ only in how the last winner is chosen. Each has benefits and drawbacks vis-a-vis what is deemed fair in terms of proportional representation. Among other things, the Droop quota tends to favor small parties over large. It also is the smallest quota which guarantees no more than 9 winners. As we will see, neither method guarantees a full complement of 9 winners.  Regardless, the Droop quota is that used by Cambridge.

Once the ballots have been collected, a sequence of steps is performed by computer. An order of polling places is determined randomly by the city beforehand. Within each polling place, ballots are sorted by the choice of 1st place candidate (and then presumably randomly within each such cohort).  The ballots then go through a series of stages.  The first stage is special.

Stage 1: Any candidate who reaches Q votes is declared a winner. Subsequent 1st place votes for them are passed to the next ranked candidate on the ballot who has not already been declared a winner. Ex. if a ballot is reached with x, y, and z as the 1st, 2nd, and 3rd candidates, and both x and y already have been declared winners, it would go to z. If no non-winner choice remains on the ballot, it is swapped with a ballot that already was consumed by the winner and has non-winner choices on it. This minimizes the number of discarded ballots. Note that it always pays for a voter to rank a lot of choices, because otherwise some other voter may have their preference registered instead. It’s not clear from the law what order the 1st place candidates’ ballots should be sorted, but we’ll assume randomly. It does not matter for the anomaly we will discuss. As the sorting proceeds, any candidate with Q votes (by spillover from other candidates or by being 1st on their own) is declared a winner, and any remaining votes for them spill over as described.

Once this process has been completed, almost every ballot has been assigned to some candidate (i.e. either consumed by a winner or spilled over to a remaining candidate). Because of the ballot-swapping mechanism described, it unlikely (but still possible) for ballots to have been discarded due to lack of non-winner alternatives. Each winner has consumed precisely Q ballots, and each remaining candidate has less than Q ballots. In what follows we use “higher-ranked” to refer to the preferred candidates on a ballot. In practice, this means they have been assigned a lower number. I.e., the 1st place candidate on a ballot is “higher-ranked” than the 2nd place candidate.

At this point, any candidate with fewer than M ballots (in our case 50) is declared to have lost. Their ballots are transferred in the same manner as before to the remaining candidates. Note that this form of elimination only takes place in this first round, since the number of ballots assigned to a candidate cannot decrease in subsequent rounds.

Stages 2+: If 9 candidates have been declared winners, the process ends. Otherwise, the trailing candidate is declared to have lost, and their votes are transferred (one by one) to the remaining candidates in the same  manner as before, but with one important change. Unlike in the first round, if no remaining non-winner candidates are listed on a ballot, it is discarded rather than swapped with another. As before, any candidate who reaches Q votes is declared a winner and can accrue no more votes. There are some tie-breaker rules associated with determining who is the trailing candidate at the end of a given round, but we won’t go into those. If at any time, the number of winners plus remaining candidates is 9, all remaining candidates are declared winners. The round ends when every ballot in play either has been spilled over (once) or discarded. Those ballots not discarded or consumed by winners and those candidates not eliminated then proceed to the next round.

Note that a spillover never can result in a ballot being assigned to a higher-ranked candidate. For example, suppose a ballot already has been assigned to the 3rd listed candidate on it. This only could happen if there was a reason to skip the top 2. This means they either already were declared winners or already were eliminated. Nor do any swaps (possible only in the 1st round) affect this. Any subsequent spillovers must go to lower-ranked candidates, or the ballot would have been handed to a higher-ranked candidate already.

Note that unless every voter ranks every candidate, it is possible for some ballots to be discarded. This is highly unlikely in the first round, because swapping is allowed. However, in subsequent rounds ballots may be discarded if they list no candidates which remain in play (i.e. that have not already been declared winners or eliminated). Though there is a theoretical bound on the number of possible discarded ballots, it can be high.

It is quite possible for an insufficient number of winners to be declared. This is no surprise. If every voter lists the same three candidates, but no others, then only three candidates will win. Insufficient ranking by voters can lead to inadequate outcomes.

Unless the field of candidates is reduced below 9 in the first round (i.e. too few candidates meet the 50 vote threshold), there ultimately will be 9 winners. However, some may not get many votes. If every voter ranks every candidate, then all winners will meet quota. If not, some candidates may win without meeting quota by dint of being the last ones uneliminated.

A number of obvious anomalies come to mind. For example, if everyone votes for x,y, and z as the top 3 candidates but there is a huge field of candidates for 4th place — so that each gets 51 spillover votes — then the remaining candidates won’t be eliminated in the first round. The remaining 6 winners then will be selected by the tie-breaker procedure (which we didn’t elaborate on).  Fair yes, desirable no. However, such anomalies can be accounted voter-failures. If each voter ranks the whole field of candidates, they won’t arise.

One important thing to note is that the election method described does not obey the conditions of Arrow’s theorem. The procedure is not even deterministic, and certainly does not satisfy the 3rd fairness condition. It is quite possible for a change in the ranking of candidate z on individual ballots to affect the order of x relative to y in the outcome even if the order of x relative to y is unchanged on those individual ballots. As an extreme example, suppose x is 1st and y is 2nd on 50 ballots and y is 1st and x is 2nd on 50 ballots, and suppose z is 3rd on all of these.   If one of the 1st 50 ballots moves z to the top, x will be eliminated in the 1st round.  If one of the 2nd 50 ballots moves z to the top y will be eliminated in the 1st round.  In neither case did the ranking of x relative to y change on any ballots.  Some anomalies arise for similar reasons to those involved in Arrow’s theorem, but others arise for different reasons.

The Anomaly

Let us now consider the specific anomaly we set out to discuss. Suppose there are 10000 ballots and 9 positions to be filled. We require 1001 votes for a candidate to win, but we’ll call it 1000 to simplify calculation. Suppose that candidate x is ranked 1st on all 10000 ballots, candidate y is ranked 3rd on all 10000 ballots, and 100 other candidates (which we’ll call z1-z100) are ranked 2nd on 100 ballots each.

Everyone agrees that candidates x and y should be on the City Council. They both rank in the top 3 choices for everyone. However, candidate y is eliminated in the first round. All the spillover votes from candidate x go to candidates z1-z100. The number could vary for each, depending on the order in which ballots are processed.  For example, it is possible that each of z1-z100 is assigned 90 spillover votes from candidate x.  It also is possible that z1-z90 would accrue 100 spillover votes each, and the rest would get 0 and be eliminated.

At the end of round 1, x is declared a winner and consumes 1000 votes, y has 0 votes, and z1-z100 each have between 0 and 100 votes.  At least 90 of them have enough to survive the 50 vote test.  However, y is eliminated.  The remaining z’s then proceed through a series of elimination and spillover rounds (with possible tie-breakers for the trailing candidate if needed) until only 8 of the z’s remain. These then are declared winners.

The result is 1 winner everyone wants, 8 winners few people agree on, and the conspicuous loss of the 2nd candidate everyone wants.

This is just one fun example of how well-intentioned voting systems can result in highly-undesirable outcomes.

“The Delivery” now is Available for Kindle!

Great news! My new short novel, The Delivery, is available on Amazon for Kindle. This book has been some time in the making, and I hope people enjoy it. A print edition (softcover) will be available shortly. All materials have been sent to the printer, and I currently am awaiting galleys. From past experience, there will be a bit of back and forth as we iron out the appearance. Depending on Ingram’s backlog and shipping speeds (for my test copies), I expect the process to take anywhere from a few weeks to two months.

Wilbur is an unassuming little man living an unassuming little life. He and his wife have a stereotypical 1950s existence, but in modern America. One day, he arrives home to discover a mysterious crate. His attempts to deal with a seemingly minor mistake lead to an escalating series of absurdities, straining his marriage, leaving the couple’s lives in tatters, and leading him to question his place in the world. Do millions perish? Does the world end? Does Wilbur figure out how to make photocopies? The Delivery is what happens when Kafka meets Monty Python.

My Moral Metrics Paper Has Been Published!

The paper develops a geometry of moral systems with applications in social choice theory.

I submitted it last October, and it recently was accepted by AJEB (the Asian Journal of Economics and Banking) for publication in a forthcoming special issue on Social Choice.

As far as I’m aware, the print version will be issued in November. However, the paper is available online now. AJEB is an Open Access journal, so there is no paywall. It can be accessed here:

PDF Version

HTML Version

The paper offers a more concise introduction to the subject than my monograph, and also introduces a lot of new material relating to applications in social choice theory.

Citation info isn’t available yet (other than the ISSN and journal), since it revolves around the print edition. I’ll post that when it becomes available.

One word to those familiar with my writing, my style, and my love of the Oxford comma. You may be surprised by some of the punctuation. Apparently, my own practices in this regard differ from those of the journal. Quite understandably, the editor favored the journal’s conventions. Except for a few instances which affected clarity, I saw no reason to quibble over what essentially is an aesthetic choice.

My thanks to Don Bamber for suggesting both social choice as an application and that I submit to AJEB!

Audiobook Samples

All three of my flash-fiction audiobooks are now available for sale. Below are 5 minute samples of each, downloadable as mp3’s.

If the covers look a bit weird it’s because ACX doesn’t allow items in the lower right corner, so I had to rejigger some of the titling.

The Man Who Stands in Line
“The Man Who Stands in Line” by K.M. Halpern. Narrated by Susie New.

Available Now on Audible, ACX, and Itunes.

The Way Around
“The Way Around” by K.M. Halpern. Narrated by Alan Moore.

Available Now on Audible, ACX, and Itunes.

The Last Cloud
“The Last Cloud” by K.M. Halpern. Narrated by Derek Botten.

Available Now on Audible, ACX, and Itunes.

The Last Cloud revision 1 released

A newly revised edition of “The Last Cloud,” is now available. In the process of developing my audiobook (currently under review by ACX), a number of typos and errors were revealed. It was surprising how much turned up aurally that had not been evident even after several edits, beta readings, and proofreadings. The pieces have been rearranged as well, to better showcase the variety of themes and styles.

Far fewer typos appeared in “The Man Who Stands in Line” and “The Way Around.” Although I eventually will put out revisions of those books, that may be a way off.

The currently available print and kindle versions are the revised ones. It appears that the Amazon “Look Inside” for the kindle version is correct, but the paperback “Look inside” may take some time to update. That has nothing to do with which version is shipped, however.

You can tell which version you have by examining the copyright page. The new one has a small “(rev 1)” after the edition. It also begins with the titular piece “The Last Cloud” rather than “Spleen Squeezer”.

Note that the audiobook is of the original version. I am very pleased with how it came out, and will post when it becomes available (probably in May).

How to Get a Patent in 2 Easy Steps!

1. Expedited Process: [Note: if your name is not Apple, Google, Microsoft, Sony, or Oracle, skip to step 2]:

Scribble a drawing in crayon on a napkin, write ‘for, you know, stuff’ and drop it off at the Patent Commissioner’s house when you have dinner with him and his wife. On the off-chance it isn’t accepted the next day, be polite but firm. The assigned examiner may be new or overworked. Bear in mind, he is NOT your employee. He serves several other large corporations as well.

By the way, don’t forget that the Patent office is running a special this month: you get every 1000th patent free!

2. Standard Process:

(i) spend several months with a team of lawyers (paid out of pocket) carefully researching the state of the art of your field, fleshing out your idea, researching potentially related patents, and constructing unassailable claims of your own. In the course of this, learn a new language called “legalese,” which bears only a superficial resemblance to English — much as its speakers bear only a superficial resemblance to humans.

(ii) assemble a meticulously crafted and airtight application — one which no sane person can find fault with, because it has no fault.

(iii) get rejected by the examiner, who clearly did a sloppy google search for some keywords. He cites several patents which have nothing in common with yours, except for those keywords.

(iv) reply to said patent examiner, patiently explaining why a simple reliance on keyword similarities is insufficient evidence of prior art, and that modern linguistic scholarship has shown different sentences can have words in common.

(v) receive a reply with “final rejection” emblazoned in huge letters, and in what appears to be blood. An attached notice explains that any further communication regarding this patent will result in a late-night visit by three large fellows with Bronx accents. Your lawyers dismiss this as boilerplate, and explain that “final rejection” actually means “we want more patent fees.”

(vi) battle your way through 50 years and $1,000,0000 of appeals and rejections as the examiner displays an almost inhuman level of ineptitude, an apparent failure to grasp rudimentary logic, infantile communication skills, and an astonishing ability to contradict himself hour to hour.

(vii) Suspect your patent examiner is planning to run for Congress, where his skills would be better appreciated. Encourage him to do so. Maybe his replacement will be better equipped, possessing both neurons and synapses.

(viii) Eventually you reach the end of the process. There has been one of two outcomes:

  • You passed away long ago, and no longer care about the patent.
  • Your application finally was accepted. Because an accepted patent is valid from the original date of application, yours expired decades ago. But this does not matter, since the idea is long obsolete anyway.

Either way, you should feel privileged. You have participated in one of the great institutions of American Democracy!

The (quasi)-Duality of the Lie Derivative and Exterior Derivative

Lecture1     Lecture2    Lecture3    Lecture4    Lecture5

This is a short set of notes that covers a couple of aspects of duality in differential geometry and algebraic topology. It grew out of an enigmatic comment I encountered, to the effect that the Lie and exterior derivatives were almost-dual in some sense. I wanted to ferret out what this meant, which turned out to be more involved than anticipated. Along the way, I decided to explore something else I never had properly understood: the nature of integration from a topological perspective. This led to an exploration of the equivalence of de Rham and singular cohomology.

The notes are in the form of five sets of slides. Originally, they comprised four presentations I gave in a math study group. On tidying, the last set grew unwieldy, so I broke it into two.

  • Lecture1: Review of DG and AT. Types of derivatives on {M}, de Rham Complex, review of some diff geom, Lie deriv and bracket, chain complexes, chain maps, homology, cochain complexes, cohomology, tie in to cat theory.
  • Lecture2: The integral as a map, Stokes’ thm, de Rham’s thm, more about Lie derivs.
  • Lecture3: Recap of de Rham cohomology, review of relevant algebra, graded algebras, tensor algebra, exterior algebra, derivations, uniqueness results for derivations, the interior product.
  • Lecture4: Cartan’s formula, tensor vs direct product, element-free def of LA, Lie coalgebras
  • Lecture5: Quick recap, relation between struct constants of LA and LCA, the choice of ground ring or field, duality of Lie deriv and exterior deriv.

These notes grew organically, so the order of presentation may seem a bit … unplanned. The emphases and digressions reflect issues I encountered, and may be peculiar to my own learning process and the many gaps in my physicist-trained math background. Others may not share the same points of confusion, or require the same background explanations. They were designed for my own use at some future point when I’ve completely forgotten the material and need a bespoke refresher. I.e., a week from now.

Although I’ve tried to polish the notes to stand on their own, there are some allusions to earlier material studied in the group. In particular, certain abbreviations are used. Here is a (hopefully) complete list:

  • DG: Differential Geometry
  • AT: Algebraic Topology
  • DR: de Rham
  • {P}: Used for a Principal bundle. Not really used here, but mentioned in passing.
  • PB: Principal Bundle. Not really used here, but mentioned in passing.
  • AB: Associated Bundle. Not really used here, but mentioned in passing.
  • LG: Lie Group. Mentioned in passing.
  • LA: Lie Algebra
  • LCA: Lie Coalgebra (defined here).
  • v.f. Vector fields
  • v.s. Vector space

The 1st 2 lectures focus on the equivalence of de Rham and singular cohomologies via a duality embodied in the integral map, and enforced by Stokes’ and de Rham’s thms. The last 3 lectures focus on the quasi-duality between the Lie derivative and exterior derivative. By quasi-duality we don’t mean to downplay its legitimacy. I didn’t go through all sorts of contortions to call a square a circle just because it sounds elegant. There is a true duality, and a beautiful one. But saying that it is directly between the Lie and exterior derivs is slightly misleading.

These notes were constructed over a period of time, and focus on the specific topic of interest. They are by no means comprehensive. Although edited to correct earlier misconceptions based on later understanding (as well as errors pointed out by the math group), the order of development has not been changed. They were written by someone learning the subject matter as he learned it. They may have some mistakes, there may be some repetition of points, and they are not designed from the ground up with a clear vision. Nonetheless, they may prove helpful in clarifying certain points or as a springboard for further study.

These notes explain the following:

  • {\int} as a map from the de Rham complex to the singular cochain complex
  • Stokes’ thm as a relationship between de Rham cohomology and singular cohomology
  • The various types of derivations/anti-derivations encountered in differential geometry
  • A review of graded algebras, tensor algebras, exterior algebras, derivations, and anti-derivations.
  • A review of Lie Derivatives, as well as Cartan’s formula
  • A discussion of what the duality of {{\mathcal{L}}} and {d} means
  • A discussion of the two views one can take of {T(M)} and {\Lambda(M)}: as {\infty}-dimensional vector spaces over {\mathbb{R}} or as finite-basis modules over the smooth fns on M. The former is useful for abstract formulation while the latter is what we calculate with in DG. The transition between the two can be a source of confusion.
  • A discussion of why derivations and anti-derivations are the analogues of linearity when we move from one view to the other.

The notes draw from many sources, including Bott & Tu, Kobyashi & Nomizu, and various discussions on stackexchange. A list of references is included at the end of the last set of slides.

The Truth about Stock Prices: 12 Myths

No-fee trading has invited a huge influx of people new to trading. In this article, I will discuss the basics of “price formation”, the mechanism by which stock prices are determined.

Like most people, for much of my life I assumed that every stock has a well-defined “price” at any given point in time. You could buy or sell at that price, and the price would move based on activity. If it went up you made money, if it went down you lost money. Trading was easy: you just bought the stocks you thought would go up and sold the ones you thought would go down.

Unfortunately, my blissful naivete was cut short. After a youthful indiscretion, I ended up doing five years at the Massachusetts Institute of Technology. When the doors finally slammed shut behind me, I emerged with little more than a bus ticket and some physics-department issued clothes. Nobody reputable would hire a man with a checkered background doing physics, so I ended up with the only sort open to hard cases: Wall Street.

I caught the eye of a particularly unsavory boss one day, and he recruited me into a gang doing stat arb at a place called Morgan Stanley. I tried to get out, but they kept pulling me back in. It took six years to find a way out, but even then freedom proved elusive. I was in and out of corporations for the next few years, and even did some contract work for a couple of big hedge funds. Only in the confusion of 2008, did I finally manage to cut ties and run. But the scars are still there. The scars never go away.

On the plus side, I did learn a bit about market microstructure. Along the way I came to understand that my original view of prices was laughably simplistic. My hope is that I can help some misguided kid somewhere avoid my own missteps. If I can save even one reader, the effort put into this post will have been repaid a thousand times over. Mainly because I didn’t put much effort into it.

Rather than a detailed exposition on market microstructure (which varies from exchange to exchange, but has certain basic principles), I will go through a number of possible misconceptions. Hopefully, this will be of some small help to new traders who wish to better understand the dynamics of the stock market. At the very least, it will make you sound smart at cocktail parties. It also may help the occasional reader avoid such minor faux pas as redditing “hey guys, why don’t we all collude to manipulate stock prices in clear violation of SEC regulations, and to such an absurd degree that it will be impossible for regulators NOT to crucify us.” But hey, what’s the worst that could result from the public subversion of a number of powerful, well-connected hedge funds and the defiant proclamation that this was intentional?

Now to the important bit. Because we live in America, and everybody sues everyone for everything, I’ll state the obvious. Before you do anything, make sure you know what you are doing. If you read it here, that doesn’t mean it’s right or current. Yes, I worked in high frequency statistical arbitrage for some time. However, my specific knowledge may be dated. Though the general principles I describe still apply, you should confirm anything I say before relying heavily on it. In particular, I am no tax expert. Be sure to consult an accountant, a lawyer, a doctor, a rabbi, and a plumber before attempting anything significant. And if you do, please send me their info. It’s really hard to find a good accountant, lawyer, doctor, rabbi, or plumber.

Don’t take anything I say (or anyone else says) as gospel. I’ve tried to be as accurate as possible, but that doesn’t mean there aren’t technical errors. As always, the onus is on you to take care of your own money. When I first started out on Wall Street, I was in awe of traders. Then I got to know some. In my first job, somebody helpfully explained why people on Wall Street were paid more than in other professions. They weren’t paid to be infallible and never make mistakes; they were paid to be attentive and diligent enough to catch any mistakes they did make.

This sounded nice, but turned out to be a load of malarkey. The highly-paid professionals on Wall Street are the same bunch of knuckleheads as in any other profession, but with better credentials. However, this cuts both ways. Many people have a view, promulgated by movies and television, that bankers are unscrupulous, boiler-room shysters. These certainly exist, but mostly amongst the armies of low-paid retail brokers, or in certain very disreputable areas such as commercial banking. The real Wall Street is quite different. The individuals I worked with were highly ethical, and the environment was far more collegial and honest than academia. And this was in the late 90’s and early 2000’s, before academia really went to pot. The few knives I had to pull out of my back were (with one exception) gleefully inserted by fellow former-physicists. Fortunately, while physicists know a lot about the kinematics of knives, they know very little about anatomy. I emerged unscathed, and even got a few free knives out of it — which I promptly sold to some folks in Academia, where such things always are in high demand.

Despite its inapplicability to actual employee behavior, the point about mistakes is a good one. It is impossible to avoid making mistakes, but if you value your money you should carefully triple-check everything. This goes doubly for any work done by an accountant, financial adviser, or other “professional” you ill-advisedly employ. They probably know less than you do, and certainly care less than you do about your money.

The best advice I can offer is to inform yourself and be careful. Do research, check, recheck, and recheck again before committing to a trade. In my personal trading, I’ve never lost out by being too slow or cautious. But I have been hammered by being too hasty.

Now to the possible misconceptions. I’ll call them “myths” because that’s what popular websites do, so obviously it’s the right thing to do, and I prefer to do the right thing because the wrong thing rarely works.

Myth 1: There is a “price” for a stock at any given point in time. When a stock is traded during market hours, there is no such thing as its “price”. There is a bid (the highest offer to buy) and an ask (the lowest offer to sell). Often, the “price” people refer to is the last trade price (the price at which the last actual transaction occurred, regardless of its size). Sometimes the midpoint (bid+ask)/2 or weighted midpoint (bid x bidsize + ask x asksize)/(bidsize + asksize) is used. For algorithmic trading, more complicated limit-book centroids sometimes are computed as well. The “closing price” generally refers to the last trade price of the day. This is what appears in newspapers.

Myth 2: I can place a limit order at any price I want. No, you cannot. Stocks (and options) trade at defined ticks. The “tick” or “tick size” is the space between allowed prices, and may itself vary with price. For example, the tick size in stock ZZZ could be $0.01 for prices below $1.00 and $0.05 otherwise. Often, ticks are things like 1/8 or 1/16 rather than multiples of $0.01. The tick size rules vary per exchange (or per security type on a given exchange) rather than per stock. In our example, any stock’s price could have allowable values of …, $0.98, $0.99, $1.00, $1.05, $1.10, … on the exchange in question.

Myth 3: Limit Orders always are better than market orders. Limit orders offer greater control over the execution price, but they may not be filled or may result in adverse selection. Suppose ZZZ is trading with a bid of $100, an ask of $101, and a tick size of $0.50. Alice places a buy limit order at $100.5. It is quite possible that it quickly will be filled, giving her $0.50 better execution than a market order.

But suppose it is not filled right away. If the stock goes up, Alice has incurred what is called “opportunity cost.” The $0.50 attempted savings now translates into having to pay a higher price or forego ownership of the stock. It’s like waiting for the price of a home to go down, only to see it go up. If you want the home (and still can afford it), you now must pay more.

Ok, but why not just leave the limit order out there indefinitely? Surely it will get filled at some point as the stock bounces around. And if not, there is no harm. You don’t end up with the stock, but haven’t lost any money. In fact, why not put a limit order at $98? If it gets executed, that’s a $2.00 price improvement!

The problem is adverse selection. Such a limit order would get filled when the stock is falling. Sure, a temporary dip could catch it. But a major decline also could. The order is likely to be filled under precisely the conditions when Alice would not want it to be. At that point, she may be able to buy the stock for $97 or $96 — if buying it remains desirable at all. In the presence of an “alpha” (loosely speaking, a statistical signal which a trader believes has some predictive power for future stock movements), it may pay to place such limit orders —but that is a specific execution strategy based on a specific model. In general, there is no free money to be had. You either incur the transaction cost of crossing the spread (i.e. paying the ask), or risk both the opportunity cost of losing out on a desirable trade and the possibility of adverse selection which lands you with the stock at the worst possible time.

Well, it isn’t strictly true there is no free money to be had. There is free money to be made, but only by market makers, uniquely positioned to accept large volumes of orders. In this, they are not unlike the exchanges themselves. You and I do not possess the technology, capital, or customer flow to make money that way.

Myth 4: I can buy or sell any quantity at the stated price. There are a couple of reasons this is not true. The “stated price” usually is the last trade price, and there is no guarantee you can buy at that same price. Just because a house down the block sold for X doesn’t mean you can buy an identical one now for X. In illiquid stocks (and quite often with options), the last trade may have taken place some time ago and be stale relative to the current quote.

In principle, you can buy at the current ask or sell at the current bid. However, even this is not guaranteed. The bid and ask can move quickly, and it may be difficult to catch them. But there also is another critical issue at play. The bid and ask are not for unlimited quantities of stock. Each has an associated size, the total number of shares being sold or sought at that price. To understand this, it is necessary to explain how an order actually is executed — and that requires the notion of a “limit book” (aka “order book”).

Most data vendors and websites will display a “quote” (aka “composite quote”) for each stock. This consists of a bid, an ask, a bid-size, and an ask-size. Although some websites may omit the sizes, they are considered part of the quote. Suppose the quote for ZZZ has a bid of $100 for 200 shares, an ask of $101 for 50 shares, and the relevant tick-size is $0.50. Then the spread is two ticks (101-100)/0.50, and the midpoint is $100.50. It isn’t necessarily the case that there is one trader offering to buy 200 shares at $100 and another offering to sell 50 shares at $101. The sizes may be aggregates of multiple orders at those price levels.

The composite quote actually is a window into a larger constellation of orders known as the limit book. The limit book consists of a set of orders at various price levels. For example, the limit book for ZZZ could have orders at $101, $101.5, $102, and $104 on the ask side, with a queue of specific orders at each level. The composite quote simply is the highest bid, the lowest ask, and the aggregate size for each.

Suppose Bob puts in a market order to buy $100 shares of ZZZ. This is matched against the orders at the lowest ask level ($101 in this case) in their order of priority (usually the time-order in which they were received). Since there only are 50 shares at $101, the exchange matches Bob against all the sell-orders at $101. It then matches the remaining 50 shares against the second ask level ($101.5) and higher until it matches them all. If it fails to match them all, Bob will have a partial fill, and the remainder of the order will be cancelled (since it was a market order). Each “fill” is a match against a specific sell-order, and a given trade can result in many fills. This is part of why your broker may sometimes send a bunch of trade confirmations for a single order on your part.

For highly liquid stocks, no order you or I are likely to place will go execute past the inner quote. However, that quote can move quickly and the price at which a market order is executed may not be what you think. Brokers also execute order flow internally, or sell flow to other institutions — which then match it against other customers or their own orders. To you it looks the same (and may actually improve your execution in some cases), but your trade may never make it to the exchange. This is fine, since you’re not a member of the exchange — your broker is.

Note the risk of a market order, especially for illiquid stocks. Suppose the 2nd ask level was $110 rather than $101.5. In that case, Bob would have bought 50 shares at $100 and 50 shares at $110. A limit order slightly past the ask would have avoided this. For example, if he wanted to ensure execution (if possible) but avoid such ridiculous levels, he could place a fill-or-kill (but not all-or-none) order at $102. This would ensure that he doesn’t pay more than $102, but he may only get a partial fill.

For stocks (other than penny-stocks), limit orders rarely are necessary as protection, though they may be desirable for other purposes. But when trading options, a limit order always should be used. If the quote is moving around a lot, this can be a good way to control worst-case execution (but in exchange for some opportunity cost). Options are a bit odd, since brokers often will write them on the spot in response to an order. You just need to figure out what their automated price-level is. Sometimes it is the midpoint, sometimes slightly higher. You almost always can do better than the innermost ask for small volume. For higher volume, you should buy slowly (over a day or two) to avoid moving the market too much — though it may be impossible if you effectively have the broker as your only counterparty. But back to Bob and ZZZ!

Now suppose that Bob places a limit order to buy 50 shares at $100.5, right in the middle of the current spread. There now is a new highest bid level: $100.5, and Bob is the sole order at that level. Any market sell order will match against him first, and this may happen so fast that the quote never noticeably changes. But if not, the new bid and bidsize will be $100.5 and 50 shares. If instead, he placed his buy order at $100, he would join the other bids at $100 as the last in the queue at that level.

What if he places it at $101 instead? If there were 25 shares available at that ask level, he would match those 25 shares. He now would have a bid for the remaining 25 shares at $101. This would be the new best bid, the quote would change accordingly. The new best ask would be $101.5. Finally, suppose he placed the limit order at $110 instead. This effectively would be a market order, and would match against the $101 and $101.5 levels as before. Note that he would not get filled at $110 in this example. If there were 25 shares each at $101 and $101.5, he would be filled at those levels and his $110 limit order would have the same effect as a $101.5 limit order.

The limit book constantly is changing and, to make things worse, there often is hidden size. On many exchanges, it’s quite possible for the limit book to show 25 shares available at $101 and yet fill Bob for all 50 at that level. There could be hidden shares which automatically replenish the sell-order but are not visible in the feed. This is intentional. Most of the time, we only have access to simple data: the current quote and the last trade price.

Note that the crossing procedure described is performed automatically almost everywhere these days. Most exchanges run “ECNs”, electronic crossing networks. An algorithm accepts orders which conform to the tick-size and other exchange rules, crossing them or adjusting the limit book accordingly. This is conceptually simple, but the software is rather involved. Because of the critical nature of an exchange, the technology has to be robust. It must be able to receive high volumes of orders with minimal latency; process them, cross them, and update the limit book; transmit limit-book, quote, and trade information to data customers; manage back-end and regulatory tasks such as clearing trades, reporting them, and processing payments; and do all this at extremely high speed, across many stocks and feeds concurrently, and with significant resilience. It definitely beats a bunch of screaming people and trade slip confetti.

Myth 5: The price at the close of Day 1 is the price at the open of Day 2. This clearly is not true, and often the overnight move is huge and predicated on different dynamics than intra-day moves. There are two effects involved. Some exchanges make provision for after-market and pre-open trading, but the main effect is the opening auction. Whenever there is a gap in trading, the new trading session begins with an opening auction. Orders accumulate prior to this, populating the limit book. However, no fills can occur. This means that the two sides of the limit book can overlap, with some bids higher than some asks. This never happens during regular trading because of the crossing procedure described earlier, and this situation must cleaned up before ordinary trading can begin.

The opening auction is an unambiguous procedure for matching orders until the two sides of the book do not overlap. It is executed automatically by algorithm. The closing price on a given day is the last trade price of that day. It often takes a while for data to trickle in, so this gets adjusted a little after the actual close but usually is fairly stable. The prices one sees at the start of the day involve a flurry of fills from the uncrossing. This may create its own minor chaos, but the majority of the overnight price move is reflected in the orders themselves. Basically, it can be thought of as a queue waiting to get their orders in. There also are certain institutional effects near the open and close because large funds must meet certain portfolio constraints. Note that the opening auction happens any time there is a halt to trading. Most opening auctions are associated with the morning open, but some exchanges (notably the Tokyo Stock Exchange) have a lunch break. Extreme price moves also can trigger a temporary trading halt. In each case, there is an opening auction before trading restarts.

Myth 6: The price fluctuations of a stock reflect market sentiment. That certainly can be a factor, often the dominant one. However, short-term price fluctuations also may be caused by mere market microstructure.

The price we see in most charts and feeds is the last trade price, so let’s go with that. Similar considerations hold for the quote midpoint, bid, ask, or any other choice of “price” that is being tracked.

When you buy at the ask, some or all of the sell-orders at that ask-level of the limit book are filled. There may be hidden size which immediately appears, or someone may happen to jump in (or adjust a higher sell-order down). But in general, this is not the case. The composite quote moves, as do all quote-based metrics. The last trade price also reflects your trade, at least until the next trade occurs.

Consider an unrealistic but illustrative example: ZZZ has a market cap of a billion dollars. Bob and Alice are sitting at home, trading. The rest of the market, including all the major institutions which own stock in ZZZ, are sitting back waiting for some news or simply have no desire to trade ZZZ at that time. They don’t participate in trading, and have no orders outstanding. So it’s just Alice and Bob. ZZZ has a last trade price of $100, Bob has a limit order to buy 1 share at $100, and Alice has a limit order to sell 1 share at $101. These orders form both the quote and the entirety of the limit book (in this case).

Bob gets enthusiastic, and crosses the spread. The price now is $101, that at which his trade transacted. Both see that the “price” just went up, and view the stock as upward-bound. Alice has some more to sell, and decides to raise her ask. She places a sell limit order for 1 share at $102. The ask now is 1x$102. Bob bites, crossing the spread and transacting at $102. The “price” now is $102. The pattern repeats with Alice always increasing the ask by $1 and Bob always biting after a minute or so. The closing price that day $150.

Two people have traded a total of 50 shares over the course of that day. Has the price of a billion dollar company really risen 50%? True, this is a ridiculous example. In reality, the limit book would be heavily populated even if there was little active trading, and other participants wouldn’t sit idly by while these two knuckleheads (well, one knucklehead, since Alice actually does pretty well) go at it. But the concept it illustrates is an important one. Analogous things can happen in other ways. Numerous small traders can push the price of a stock way up, while larger traders don’t participate. In penny stocks, this sort of thing actually can happen (though usually not in such an extreme manner). When a stock’s price changes dramatically, it is important to look at the trading volume and (if possible) who is trading. When such low-volume price moves occur, it is not a foregone conclusion that the price will revert immediately or in the near term. Institutional traders aren’t necessarily skilled or wise, and can get caught up in a frenzy or react to it — so such effects can have real market impact. However, most of the time they tend to be transient.

Myth 7: Shorting is an abstraction, and is just like buying negative shares. In many cases, it effectively behaves like this for the trader. However, the actual process is more complicated. “Naked shorts” generally are not allowed, though they can arise in anomolous circumstances. When you sell short, you are not simply assigned a negative number of shares, which settles accordingly. You are borrowing specific shares of stock from a specific person who has a long position. The matching process is called a “locate” and is conducted at your broker’s level if possible or at the exchange level if the broker has no available candidates. There is an exception for market-makers and for brokers when a stock is deemed “easy to borrow”, meaning it is highly liquid and there will be no problem covering the short if necessary. Brokers maintain dynamic “easy to borrow” and “hard to borrow” lists for this purpose.

From the standpoint of a trader, there are two situations in which a short may not behave as expected. Suppose Bob sells short 100 shares of ZZZ stock, and the broker locates it with Alice. Alice owns 100 shares, and the broker effectively lends these to Bob. If Alice decides to sell her shares, Bob now needs to return the shares he borrowed and be assigned new ones. Normally, this is transparent to Bob. But if replacement shares cannot be located, he must exit his short position. The short sale is contingent on the continuing existence of located shares.

Because of the borrowing aspect, Bob’s broker also must ensure he has sufficient funds to cover any losses as ZZZ rises. This requires a margin. If ZZZ goes up, Bob may have to put up additional capital or exit his position (and take the loss). In principle, a short can result in an unlimited loss. In practice, Bob would fail a margin call before then. I.e., Bob cannot simply “wait out” a loss as he could with a long position.

If — as you should — you view the value of your position as always marked-to-market, then (aside from transaction cost or tax concerns) you never should hold a position just to wait out a loss. Most people don’t think or act this way, and there sometimes are legitimate reasons not to. For example, a long term investment generally shouldn’t be adjusted unless new information arrives (though that information may regard other stocks or externalities which necessitate an overall portfolio adjustment). One could argue that short term random fluctuations do not constitute new information, and without an alpha model one should not trade on them. This is a reasonable view. However, the ability to avoid doing so is not symmetric. Because of the issues mentioned, short positions may be harder to sustain than long ones.

The next couple of myths involve some tax lingo. In what follows “STCG” refers to “Short Term Capital Gain” and “LTCG” refers to “Long Term Capital Gain”. “STCL” and “LTCL” refer to the corresponding losses (i.e. negative gains).

Myth 8: Shares are fungible. When you sell them, it doesn’t matter which ones you sell. This is true from the standpoint of stock trading, but not taxes. Most brokers allow you to specify the specific shares (the “lots”) you wish to sell, though the means of doing so may not be obvious. However, for almost all purposes two main choices suffice: LIFO and FIFO. Most of the time, FIFO is the default. With many brokers, you can change this default for your account, as well as override it for individual trades. Let’s look at the difference between FIFO and LIFO.

Suppose Bob bought 100 shares of ZZZ at $50 3 years ago and bought another 100 shares of ZZZ at $75 6 months ago. ZZZ now is at $100, and he decides to sell 100 shares. If he sells the first 100 shares, a LTCG of $5000 ($10000 – $5000) is generated, but if he sells the second 100 shares a STCG of $2500 ($10000 – $7500) is generated. The implications of such gains can be significant, and are discussed below. The specifics of Bob’s situation will determine which sale is more advantageous — or less disadvantageous.

The first choice corresponds to FIFO accounting: first in, first out. The second corresponds to LIFO: last in, first out. One usually (but not always) benefits from FIFO, which is why this is the default. Note that FIFO and LIFO are relative to a given brokerage account, since a broker only knows what about your positions with it. If Bob had an earlier position with broker B, broker A does not know about it or cannot sell it. In that case, Bob must keep track of these things. FIFO and LIFO are relative to the specific account in question, but the tax consequences for Bob are determined across all brokerage accounts. We’ll see what this means in a moment.

All capital gains are relative to “basis” (or “tax basis”), generally the amount you paid for the stock when you bought it. In the example above, the basis for the first lot was $5000 and the basis for the second was $7500. This was why the LTCG from the first was $5000, while the STCG from the second was $2500. With stocks (but not necessarily mutual funds), a tax event only occurs when you close your position. If you hold the shares for 10 years, only on year 10 is a capital gains tax event generated. This can allow some strategic planning, and part of your overall investment strategy may involve choosing to sell in a low-income year. Note that dividends are taxed when you receive them, and regardless of whether they are cash or stock dividends or you chose to reinvest them. Also note that some mutual funds generate tax events from their own internal trading. You could be taxed on these (STCG or LTCG), and it is best to research the tax consequences of a fund before investing in it.

If you transfer stocks between accounts (usually done when transferring a whole account to a new broker), their tax basis is preserved. No tax events are generated. Note that the transfer must be done right. If you manually close your old positions and open new ones (with enough time between), you may generate a tax event. But if you perform an official “transfer” (usually initiated with your destination broker), the basis is preserved and no tax event occurs. Whether your broker will know that basis is another question. Not every broker’s technology or commitment to customer convenience is up to snuff. It is a good practice to keep your own careful records of all your trading activity.

When would LIFO be preferable? There are various cases, but the most common is to take a STCL to offset STCGs. STCGs tend to be taxed at a much higher rate than LTCGs, so taking a loss against them often is the desirable thing to do. In Bob’s case, if the price had gone down to $25 instead of up to $100, he could sell at a loss and use that loss to offset gains from some other stocks. He would have to specify LIFO to sell the newer lot and generate the STCL.

Myth 9: A “no-fee” trading account is better than one with fees. The cost to a trader involves several components. The main three are broker fees, exchange fees, and “execution”. “No-fee” refers to the broker fee. Unless many small trades are being executed with high frequency, the broker fee tends to be small. The exchange fees are passed along to you, even for “no-fee” accounts. The “execution” is the bulk of the cost. No or low-fee brokers often cross flow internally or sell flow to high-frequency firms which effectively front-run you. Market orders see slightly worse execution than they could, and limit orders get filled with slightly lower frequency than they could (or are deferred, causing slight adverse selection). These effects are not huge, but something to be aware of.

Suppose Alice buys 100 shares of ZZZ at $100. Broker X is no-fee, and Broker Y charges a fee of $7.95 per trade but has 10 bp (0.1%) better execution than Broker X on average. That 10 bp is just a price improvement of $0.10, and amounts to $10. Alice does better with Broker Y than Broker X. This benefit may seem to apply only to large trades, but it also applies to stocks with large spreads. For illiquid stocks (including penny stocks) the price improvement can be much more significant. There are trading styles (lots of small trades in highly liquid stocks) where no-fee sometimes trumps better execution, but most often it does not.

Myth 10: Taxes are something your accountant figures out, and shouldn’t affect your trading. Selling at the best price is all that matters. Taxes can eat a lot of your profit, and should be a primary consideration. Tax planning involves choosing accounts to trade in (401K or other tax-deferred vs regular), realizing losses to offset gains, and choosing assets with low turnover. As mentioned, some mutual funds can generate capital gains through their internal trading. In extreme cases, you could pay significant tax on a losing position in one.

Why are taxes so important to trading? The main reason is that there can be a 25% (or more) difference in tax rate between a LTCG and a STCG. STCGs often are taxed punitively, or at best are treated like ordinary income. Here in MA, the state tax alone is 12% for STCGs vs 5% for LTCGs. Federally, STCGs are treated as ordinary income while LTCGs have their own lower rate.

STCGs are defined as positions held for under one year, while LTCGs are held for over one year. Note that it is the individual positions that matter. If Bob owns 200 shares of ZZZ, bought in two batches, then each batch has its own basis and its own purchase date. Also note that most stock option positions result in a STCG or STCL. A STCG only can be offset by a STCL, but a LTCG can be offset by a LTCL or STCL. Clearly, STCLs are more valuable than LTCLs. They can be rolled to subsequent years under some circumstances, but may be automatically wasted against LTCGs if you are not careful.

A good understanding of these details can save a lot of money. To understand the impact, suppose Alice has a (state+federal) 20% LTCG marginal tax rate and a 45% STCG marginal tax rate. She makes $10,000 on a trade, not offset by any loss. If it is a LTCG, she pays $2000 in taxes and keeps $8000. If it is a STCG, she pays $4500 and keeps $5500. That’s an additional $2500 out of her pocket. Since the markets pay us to take risk, she must take more risk or tie up more capital to make the same $8000 of after-tax profit. How much more capital? Not just the missing 25%, because the extra profit will be taxed at 45% as well. We solve 0.55 x= 8000, to get 14,545. Alice must take tie up 45% more capital or (loosely speaking) take 45% more risk to walk away with the same after-tax profit.

Myth 11: Options are like leveraged stock. No. This is untrue for many reasons, but I’ll point out one specific issue. Options can be thought of as volatility bets. Yes, the Black Scholes formula depends on the stock price in a nonlinear manner, and yes the Black Scholes model significantly underestimates tail risk. But for many purposes, it pays to think of options as predominantly volatility-based. Let’s return to our absurd but illustrative earlier scenario involving Bob bidding himself up and Alice happily making money off him.

As before, they trade ZZZ stock and are the only market participants but don’t know it. They run up their positions as before, with Bob buying a share from Alice at $100, then $101, up to $109. He now owns 10 shares. Both are so excited to be trading, they fall over backward in their chairs and bang their heads. Alice goes from pessimistic to optimistic, while Bob goes from optimistic to pessimistic. He wants to unload some of his stock, and offers to sell a share at $109. Alice now is optimistic, so she buys. He tries again, but gets no bite so he lowers the price to $108. Alice thinks this is a good deal and snaps it up. Bob sees the price dropping and decides to get out while he can. He offers at $107, Alice buys. And so on. At $100 he has sold his last share. Both are back where they started, as is the last reported trade price of ZZZ. At this point, both lean back in relief and their chairs topple over again. Now they’re back to their old selves, and they repeat the original pattern, with Alice selling to Bob at $100, $101, etc. Their chairs are very unstable, and this pattern repeats several times during the day. The last leg of the day is a downward one.

The day’s trading involves ZZZ stock price see-sawing between 100 and 109, and the price ends where it started. Consider somebody trading the options market (maybe Alice and Bob are the only active stock traders that day because everybody else is focusing on the options market). The price of ZZZ is unchanged between the open and close, but the prices of most ZZZ call and put options have risen dramatically. Option prices are driven by several things: the stock price, the strike price, the time to expiry, and the volatility. If the stock price rises dramatically, put options will go down but not as much as the price change would seem to warrant. This is because the volatility has increased. In our see-saw case, the volatility rose even when the stock price remained the same.

Myth 12: There are 12 myths.

My new monograph is out!

My new math monograph now is available on and, and soon will be available on other venues via Ingram as well.

Amazon US Paperback
Amazon UK/Europe Paperback

The monograph is an attempt to mathematically codify a notion of “moral systems,” and define a sensible measure of distances between them. It delves into a number of related topics, and proposes mathematical proxies for otherwise vague concepts such as hypocrisy, judgment, world-view, and moral trajectory. In addition to detailed derivation of a number of candidate metrics, it offers several examples, including a concrete distance calculation for a simple system. The framework developed is not confined to the analysis of moral systems, and may find use in a wide variety of applications involving decision systems, black box computation, or conditional probability distributions.

Why Your Book Won’t Be an Amazon Success Story

I’m going to be that guy. The one nobody likes at parties. The one who speaks unpleasant truths. If you don’t want to hear unpleasant truths, stop reading.

If you want to be told which self-help books to buy and which things to do and which gurus will illuminate the shining path to fame and fortune, stop reading.

If you want somebody to hold your hand, and nod at all the right moments and ooh and aah about how your writing has come a long way and you’re “almost there,” stop reading.

It doesn’t matter whether you’ve come a long way. It doesn’t matter whether your writing is almost there, is there, or is beyond there. It doesn’t matter what you’re saying or how you’re saying it. You may have written the most poignant 80,000 words in the English language, or you may have another book of cat photos. None of that matters.

Unless you’re a certain type of person saying a certain type of thing in a certain way, none of it matters. And that certain type of person, that certain type of thing, and that certain way changes all the time. Today it’s one thing, tomorrow it will be another.

Statistically speaking, you’re not it.

“But what about all those success stories,” you argue. “I’m always hearing about Amazon success stories. Success, success, success! This book mentioned them and that blog mentioned them and the 12th cousin of my aunt’s best friend’s roommate had one.”

There are two reasons this doesn’t matter.

Most of those stories are part of a very large industry of selling hope to suckers. Any endeavor which appeals to the masses and appears to be accessible to them spawns such an industry. Business, stock picking, sex, dating, how to get a job, how to get into college, and on and on. Thanks to today’s low barrier to entry, self-publishing is the newest kid on that block.

This isn’t a conspiracy, or some evil corporation with a beak-nosed pin-striped CEO, cackling ominously while rubbing his hands. Self-publishing just attracts a lot of people who see an easy way to make money. When there’s a naive, eager audience, a host of opportunists and charlatans purvey snake oil to any sucker willing to pay. They’re predators, plain and simple. Hopefully, I can dissuade you from being prey. Leave that to others. Others unenlightened by my blog. Cynicism may not always be right, but it’s rarely wrong.

Even seemingly reputable characters have become untrustworthy. The traditional publishing industry has grown very narrow and institutional, and life is hard for everyone associated with it. The temptation to go for the easy money, and cast scruples to the winds, is quite strong. Not that denizens of the publishing industry ever were big on scruples. Many individuals from traditionally respectable roles as agents, editors, and publishers find it increasingly difficult to eke out a living or are growing disillusioned with a rapidly deteriorating industry. It is unsurprising that they are bedazzled by the allure of easy money. Unsurprising, and disappointing. This is especially insidious when agents offer paid services which purport to help improve your chances with other agents. The argument is that they know what their kind wants. Anybody see the problem with this? Anybody, anybody, Bueller? It would be like H.R. employees taking money to teach you how to get a job with them. Oh wait, they do. How could THAT possibly go wrong…

I’m not going to delve into the “selling hope to suckers” angle here. That is fodder for a separate post, in which I analyze a number of things which did or did not work for me. For now, I’ll focus on the second reason your book won’t be an Amazon Success Story. Incidentally, I will resist the temptation to assign an acronym to Amazon Success Story. There! I successfully resisted it.

In this post, I’ll assume that ALL those stories you hear are right. Not that they’re 99% bunk or that most actual successes had some outside catalyst you’re unaware of or were the result of survivorship bias (the old coin-flipping problem to those familiar with Malkiel’s book). To paraphrase the timeless wisdom of Goodfellas, if you have to wait in line like everyone else you’re a schnook. If you’re trying what everyone else tries, making the rounds of getting suckered for a little bit here a little bit there, with nothing to show for it — you’re the schnook.

Don’t feel bad, though. No matter how savvy we are in our own neighborhoods, we’re all schnooks outside it. Hopefully, I can help you avoid paying too much to learn how not to be a schnook.

I can’t show you how to be successful, but I can show you to avoid paying to be unsuccessful. But that’s for another post. We’re not going to deal with the outright lies and deception and rubbish here. Those are obvious pitfalls, if enticing. Like pizza.

In this post, we’re going to assume the success stories are real — as some of them surely are. We’re going to deal with something more subtle than false hope. We’re going to discuss the OTHER reason you won’t be successful on Amazon. It’s not obvious, and it can’t be avoided.

But first, I’m going to make a plea: if you’re the author of one of those breathless, caffeinated “how to be a bzillionaire author like me” books or blogs or podcasts … stop it. Please. Just stop it. Unless you’re cynically selling hope to suckers or mass-producing content-free posts as click-bait. In that case, carry on. I don’t approve of what you do, but I’m not going to waste breathe convincing dirtbags not to be dirtbags. However, if you’re even the least bit well-meaning, stop. Maybe you have some highly popular old posts along these lines. Update them. Maybe you’re writing a new series of posts based on what your friend named John Grisham has to say to self-publishing authors. Don’t.

You’re doing everyone a disservice. People will waste money and time and hope. Best to tell them the truth. You may not be that guy. You may be too nice, tactful, maybe even (dare I say) an optimist. I’m not an optimist. I AM that guy. No false hope sold here.

Maybe you’re still reading this and haven’t sky-dived into a volcano or fatally overdosed on Ben & Jerries, or turned to one of those cheerful, caffeinated blogs. Shame on you. There’s special internet groups for people like you. But you’re still here, and I haven’t driven you away. I must be doing something wrong.

If you’re a true dyed in the wool masochist, I’ll now explain why you won’t be successful. It has to do with a tectonic shift in Amazon’s policies.

Over a year ago, I wrote a post titled “Why NOT to use Amazon Ads for your book,” which many people have written me about. Most found it a useful take on Amazon ads, and one of the few articles which doesn’t regurgitate lobotomized praise for the practice.

I stand by that. Subsequent experiments (to be reported in a future post) have shown that Amazon ads perform even worse now. This led me to wonder why. Why did all the long-tailed keywords and the reviews and the ads make no difference. None of us know the precise inner workings of Amazon ads, but there are strong indications of their behavior.

I now will offer my theory for why there are success stories, why it’s tempting to believe they can be emulated, and why they cannot. To do so, let’s review some basic aspects of Amazon’s algorithms.

There are two algorithms we care about:

(1) The promotion algorithm, which ranks your book. It is responsible for placing it in any top 100 lists, determining its visibility in “customers also bought” entries, when and how it appears in searches, and pretty much any other place where organic (i.e. non-paid) placement is involved.

(2) The ad auction algorithm, which determines whether you win a bid for a given ad placement.

The promotion algorithm determines how much free promotion your book gets, and is critical to success. It has only a couple of basic pieces of information to work with: sales and ratings. The algorithm clearly reflects the timing of sales, and is heavily weighted toward the most recent week. It may reflect the source of those sales — to the extent Amazon can track it — but I have seen no evidence of this. As for ratings, all indications are that the number of ratings or reviews weighs far more heavily than the ratings themselves. This is true for consumers too, as long as the average rating is 3+. Below that, bad ratings can hurt. Buyers don’t care what your exact rating is, as long as there isn’t a big red flag. The number of ratings is seen as a sign of legitimacy, that your book isn’t some piece of schlock that only your grandmother and dad would review — but your mom was too ashamed to attach her name to. Anything from a traditional publisher has 100’s to 1000’s of ratings. A self-published work generally benefits from 15+. More is better.

It makes sense that the promotion algorithm can play a role, but why mention an “ad auction algorithm”. Ad placement should depend on your bid, right? Maybe you can tweak the multipliers and bids for different placements or keywords, but the knobs are yours and yours alone. You might very well think that, but I couldn’t possibly comment. Unlike the ever-diplomatic Mr. Urquhart, I’m too guileless to take this tack. I also don’t use Grey Poupon. I can and will comment. You’re wrong. Amazon’s ad algorithm does a lot more behind the scenes. You may be the highest bidder and still lose, and you may be the lowest bidder and win.

As usual, we must look at incentives to understand why things don’t behave as expected. Amazon does not run ads as a non-profit, nor does it get paid a subscription fee to do so. It only makes money from an ad when that ad is clicked, and it only makes money from a sale when the ad results in a conversion. For sellers, the latter is a commission and for authors it’s the 65% or 30% (depending on whether you chose the 35% or 70% royalty rate) adjusted for costs, etc. In either case, they make money from each sale and they make money from each click.

Amazon loses money if your ad wins lots of impressions, but nobody clicks on it. They would have been happier with a lower bid that actually resulted in clicks. If lots of people click on your ad, but few people buy your book, Amazon would have been happier with a lower bid which resulted in more sales. It’s a trade-off, but there are simple ways of computing these things. When you start fresh, Amazon has no history (though perhaps if you have other books, it uses their performance). It assigns you a set of default parameters representing the average performance of books in that genre. As impressions, clicks, and sales accrue, Amazon adjusts your parameters. This could be done through a simple Bayesian update or periodic regressions or some other method.

When a set of authors bids on an ad, Amazon can compute the expected value of each bid. This looks something like P(click|impression)*ebid + P(sale|click)P(click|impression)*pnl, where P(click|impression) is your predicted click-through-rate for that placement, P(sale|click) is your predicted conversion rate for that placement, ebid is the effective bid (I’ll discuss this momentarily), and pnl is the net income Amazon would make from a sale of your book. This is an oversimplification, but gets the basic idea across.

The ebid quantity is your effective bid, what you actually pay if you win the auction. There actually are two effective bids involved. Amazon’s ad auctions are “second-price,” meaning the winning bidder pays only the 2nd highest bid. Suppose there are 5 bids: 1,2,3,4,5. The bidder who bid 5 wins, but only pays 4. There are game theoretic reasons for preferring this type of auction, as it encourages certain desirable behaviors in bidders. In this case, the effective bid (and what Amazon gets paid) is 4. That is no mystery, and is clearly advertised in their auction rules. What isn’t advertised is the other, hidden effective bid. These effective bids may be 3,2,4,2,3, in which case the third bidder wins. What do they actually pay? I’m not sure, but something less than their actual bid of 3.

Apparently, whatever algorithm Amazon uses guarantees that a bidder never will pay more than their actual bid. It somehow combines the two types of effective bids to ensure this. I am not privy to the precise algorithm (and it constantly changes), so I cannot confirm this. However, I have been informed by an individual with intimate knowledge of the subject that Amazon’s approach provably guarantees no bidder will pay more than their actual bid.

Why would Amazon prefer a lower bid, when they could get 4? As mentioned, they only get paid 4 if the ad of the winning bidder (the 5) gets a click. If the ad makes every reader barf or have a seizure or become a politician, there won’t be a lot of clicks. If it’s the most beautiful ad in human history, but the book’s landing page makes potential buyers weep and tear their hair and gnash their teeth, it probably won’t make many sales. In either case, Amazon would do better with another bidder.

Even without knowing the precise formula, one thing is clear. These algorithms are a big problem for anyone who isn’t already a star.

The problem is that those two algorithms play into one another, generating a feedback loop. If you’re already successful, everything works in your favor. But if you start out unattractive to them, you remain that way. You have few quality ad placements, and get few sales, and this suppresses your organic rank. The organic rank factors into many things which affect P(click|impression) and P(sale|click) — such as the number of reviews, etc. Put simply, once they decide you’re a failure, you become a failure, and remain one. You won’t win quality bids, even if you bid high. If you bid high enough to override the suppression, then you’ll pay an exorbitant fee per click, and it will cost a huge amount to reach the point where success compounds.

I am unsure whether there is cross-pollination between works by a given author, but I strongly suspect so. A new work by a top-ranked author probably starts high and is buoyed by this success. This may be why we see a dozen works by the same author (obviously self-published, and sometimes with very few ratings per book) in the top-100 in a genre.

So how do you get out of this hole? There’s only one accessible way for most people: you cheat. And this is where Amazon’s tectonic policy shift comes into play.

There ARE success stories, like the aforementioned top-ranked self-published authors. But there won’t be any more. To understand why, we must turn to hallowed antiquity before Bezos was revealed to be the latest incarnation of Bchkthmorist the Destroyer, and when Amazon brought to mind a place with trees, snakes, and Sean Connery.

There was a time when the nascent self-publishing industry had really begun to boom, but was poorly regulated. The traditional publishers viewed Amazon, Kindle, and self-publishing as a joke. They relied on their incestuous old-boys network of reviewers from the NY Times, New York Review of Books, and pretty much anything else with New York in the name for promotion. 95% of self-published books were about how to self-publish, and authors who DID self-publish (and were savvy) quickly developed ways to game Amazon.

They COULD pump up their search results, get in top-100 lists, and so on. Usually, this involved getting lots of fake reviews and using keyword tricks to optimize search placement. Once in the top list for a genre, it was easy to stay there — though newcomers with more fake reviews and better keyword antics could displace you. The very top was an unstable equilibrium, but the top 500 or 1000 was not. Once up there, it was easy to keep in that range and then occasionally pop into the very top. Like a cauldron of mediocrity, circulating its vile content into view every now and then. Amazon periodically tweaked its algorithms, but authors kept up.

Then something happened. Amazon decided to crack down on fake reviews. This sounds laudable enough. Fake reviews have the word fake in them, and fake always is bad, right?

There were two problems with HOW Amazon went about it. First, they went way overboard. Overnight, it became well-nigh impossible for an author to get a single new review. If the reviewer had one letter in common with your name, lived in the same hemisphere, or also breathed air, they were deemed connected to you and thus biased.

If this had been applied uniformly, there would be nobody in the top 100 — or it would be random, since nobody would have any tricks they could play. This is where the second problem with Amazon’s approach came in. They didn’t remove legacy fake ratings. Those who cheated before the cutoff got to keep their position. In fact, that position now was secure against all newcomers. A gate had slammed down, and they were firmly on the right side of it. Aside from a few people near the boundary they had nothing to fear. Well, almost nothing to fear.

The only way to break into the top echelon, and thus benefit from the self-reinforcing algorithms which stabilize that position, is to rely on external sources of sales. If you have a million twitter followers who buy your book, or a massive non-amazon advertising campaign, you can break in. They YOU would be very difficult to displace.

Once traditional publishers realized that Amazon is the only de facto bookstore left (outside airport/supermarket sales), they took an interest. THEY have no problem getting a top rank, because they run huge advertising campaigns and have huge existing networks. This is why the top 100 lists are an odd mixture of self-published books you never heard of and traditionally published bestsellers. Eventually it only will be the latter.

So. You. Won’t. Break. In. Amazon created an impenetrable aristocracy, and you’re not it. You won’t be it. You can’t be it. If you use Amazon ads or buy into any of the snake oil sales nonsense, you’ll be the schnook bribing a maitre d’ who knows he’ll never let you in.

Most of those success stories (or at least the real ones) are from before the policy change, as are many of the methods being touted. That path is gone. Amazon ads only work for those who don’t need them, and they work very well for them. They won’t work for you. Becoming a success on Amazon is as unlikely as with a traditional publisher. You’ll always hear stories, but they’re either the few who randomly made it, those with hidden external mechanisms of promotion, or those already entrenched at the top.

That’s the sad truth, or at least my take on it. By all means, waste a few dollars trying. I used to be a statistical trader and know better, but I still buy a lottery ticket when the jackpot’s high enough. It’s entertainment. Two dollars to dream for a day. I just don’t expect to win.

Write what you want, revise, work your butt off, and make it perfect. But do it because you want to, because that’s what makes you happy. Don’t do it expecting success, or hoping for success, or even entertaining the remote possibility of success.

The worst reason to write is for other people. Your work won’t be read, and your work won’t make you money. If you accept that and are happy to write anyway, then write all you want. I urge you to do so. It’s what I do.

PACE Sample Chapter

 The following is a sample chapter from my book PACE.

Captain Alex Konarski gazed through the porthole window at the blue mass below. It looked the same as it had for the last nine years. When first informed of the Front, he had half-expected to see a pestilential wall of grey or a glowing force field or some other tell-tale sign. Instead there was nothing, just the same globe that always was there. The same boring old globe.

Konarski remembered the precise time it had taken for her charms to expire. Six months, twelve days. It was the same for every newcomer to the ISS; at first, they gawked at the beauty of Earth and couldn’t shut up about it. Then they did. Konarski always waited a discrete period after each arrival before asking how long it had taken.

Nobody seemed to remember the point at which things changed, they just woke up one day and the magic was gone. How like marriage, he’d laugh, slapping them on the back. By now the joke was well-worn. Of course, it wasn’t just the Earth itself. When somebody new arrived, they acted like a hyperactive puppy, bouncing with delight at each new experience, or perhaps ricocheting was a better choice of word up here.

Once the excitement died down, they discovered it was a job like any other, except that home was a tiny bunk a few feet from where you worked. The tourists had it right: get in and out before the novelty wore off. The ISS basically was a submarine posting with a better view and better toilets.

Earth became something to occasionally note out the corner of one’s eye. Yep, still there. Being so high up almost bred contempt for the tiny ball and its billions of people. This had been less of a problem in the old days, when the ISS sounded like the inside of a factory. But since the upgrade, things were so quiet that one could not help but feel aloof. Aloof was invented for this place. As a general rule, it was hard to hold in high regard any place toward which you flushed your excrement. Well, not quite *toward*.

There was a fun problem in orbital mechanics that Konarski used to stump newbies with. Of course, Alex had learned it in high school, but his colleagues — particularly the Americans — seemed to have spent their formative years doing anything but studying. For some reason, America believed it was better to send jocks into orbit than scientists. Worse even, it made a distinction between the two. Nerds are nerds and jocks are jocks and never the twain shall meet. It was a view that Konarski and most of the older generation of Eastern Europeans found bewildering. But that was the way it was.

So, Alex and his friends gave the newbies the infamous “orbit” problem. If you are working outside the ISS and fling a wrench toward Earth what will happen? Invariably, the response was to the effect that “well, duh, it will fall to Earth”. With carefully practiced condescension, Alex then would inform them that this is not correct. The wrench will rebound and hit the pitcher. It was one of the many vagaries of orbital dynamics, unintuitive but fairly obvious on close reflection.

The victim would argue, debate, complain, declare it an impossibility. Alex patiently would explain the mathematics. It was no mistake. Only after the victim had labored for days over a calculation that any kid should be able to do would they — sometimes — get the answer.

For some reason the first question they asked after accepting the result always was, “How do you flush the toilets?”

“Very carefully,” Alex would answer.

Then everybody had a drink and a good laugh. Yes, shit would fall to earth just as it always had and always would.

The spectrometer indicated that there was some sort of smog developing over Rome. Alex wondered if this would be a repeat of Paris. There had been sporadic fires for weeks after the Front hit that city. Some were attributable to the usual suspects: car crashes as people fled or died, overloads and short-circuits, the chaos of large numbers of people fleeing, probably even arson, not to mention the ordinary incidence of fires in a major city, now with nobody to nip them in the bud. Mostly, though, it just was the unattended failure of humanity’s mechanized residue.

The Front couldn’t eradicate every trace of our existence, but perhaps it would smile gleefully as our detritus burned itself out. Those last embers likely would outlast us, a brief epitaph. Of course, the smaller fires weren’t visible from the station, and Alex only could surmise their existence from the occasional flare up.

The same had occurred everywhere else the Front passed. In most cases there had been a small glow for a day or so and then just the quenching smoke from a spent fire. On the other hand, there was a thick haze over parts of Germany since fires had spread through the coal mines. These probably would burn for years to come, occasionally erupting from the ground without warning. There was no need to speculate on *that*; Konarski’s own grandfather had perished this way many years ago. The mines had been killing people long before there was any Front. But the occasional fireworks aside, cities inside the Zone were cold and dead.

The ISS orbited the Earth approximately once every ninety minutes. This meant that close observation of any given area was limited to a few minutes, after which they must wait until the next pass. During the time between passes, the Front would expand a little over a quarter mile. Nothing remarkable had happened during the hundred passes it took for the Front to traverse Paris. And it wasn’t for another twenty or so that the trouble started.

*Trouble?* Something about the word struck him as callous. It seemed irreverent to call a fire “trouble”, while ignoring the millions of deaths which surely preceded it. Well, the “event”, then. Once it started, the event was evident within a few passes. Alex had noticed something wrong fairly quickly. Instead of a series of small and short-lived flare ups, the blaze simply had grown and grown.

At first he suspected the meltdown of some unadvertised nuclear reactor. But there was no indication of enhanced radiation levels. Of course, it was hard to tell for sure through the smoke plume. By that point it looked like there was a small hurricane over Paris, a hurricane that occasionally flashed red. It really was quite beautiful from his vantage point, but he shuddered to think what it would be like within that mile-high vortex of flame.

It had not ceased for seven days. Some meteorologist explained the effect early on. It was called a firestorm, when countless small fires merge into a monster that generates its own weather, commands its own destiny. It was a good thing there was nobody left for it to kill, though Alex was unsure what effect the fountain of ash would have on the rest of Europe.

In theory there probably were operational video feeds on the ground, but the Central European power grid had failed two months earlier. It had shown surprisingly little resilience, and shrouded most of Europe in darkness. Of course, the relevant machinery lay within the Zone and repairs were impossible.

Konarski wondered how many millions had died prematurely because some engineering firm cut corners years ago. It probably was Ukrainian, that firm. Alex never trusted the Ukrainians. Whatever the cause, the result was that there was no power. And by the time Paris was hit any battery-driven units were long dead. Other than some satellites and the occasional drone, he and his crew were the only ones to see what was happening.

The Paris conflagration eventually had withered and died out, of course. What was of interest now was Rome. The ISS had been asked to keep an eye on the regions within the Zone, gleaning valuable information to help others prepare or, if one were fool enough to hope, understand and dispel the Front altogether. However, the real action always surrounded the Front itself. Especially when it hit a densely-developed area, even if now deserted. But it wasn’t just orders or morbid curiosity that compelled Alex to watch. Where evident, the destruction could be aesthetically beautiful.

Safely beyond the reach of the Front, Alex could watch the end of a world. How many people would have the opportunity to do so? There was a certain pride in knowing he would be among the last, perhaps even *the* last. Once everyone had perished, the crew of the ISS would be alone for a while, left to contemplate the silence. Then their supplies would run out, and they too would die.

Based on the current consumption rate of his six person crew, Alex estimated they could survive for another six years — two years past the Front’s anticipated circumvallation of Earth. Of course, he doubted the process would be an orderly one. Four of the crew members (himself included) came from military backgrounds, one was a woman, and three different countries were represented. Even at the best of times, there was a simmering competitiveness.

Konarski assumed that he would be the first casualty. No other scenario made sense, other than something random in the heat of passion — and such things didn’t require the Front. No, barring any insanity, he would go first. He was the leader and also happened to be bedding the only woman. Who else would somebody bother killing? Of course, with *this* woman, he shuddered to think what would happen to the murderer. Of course, *she* was the one most likely to kill him in the first place.

Obviously, they hadn’t screened for mental health in the Chinese space program. In fact, he guessed that any screening they *did* do was just lip-service to be allowed to join the ISS. But Ying was stunning and endlessly hilarious to talk to, and Alex had nothing to lose.

If the Front hadn’t come along, he would have faced compulsory retirement the following year. Then he would have had the privilege of returning to good old Poland, a living anachronism in a country that shunned any sign of its past. Alex gave it about a year before the bottle would have taken him. Who the fuck wanted to grow old in today’s world? The Front was the best thing that ever happened, as far as he was concerned. It made him special.

Alex would try to protect Ying for as long as he could, but he knew how things would unfold. Perhaps it would be best to kill her first, before anyone got to him. Or maybe he just should suicide the whole crew. It would be the easiest thing in the world, all he really had to do was stop trying to keep everyone alive. Or he actively could space the place and kill everyone at once, a grand ceremonial gesture. But that would be boring.

Besides, part of him wanted to see who *would* be the last man standing. The whole of humanity in one man. The one to turn out the lights, not first but final hand. Humanity would end the way it began, with one man killing another. After all, everybody always was talking about returning to your roots. Alex just was sad they no longer had a gun on board. That *really* would have made things interesting.

These were distant considerations, however; worth planning for, but hardly imminent. At the moment the world remained very much alive, and was counting on them for critical information. Alex wondered if it would be better to be the last man alive or the man who saved the world.

“The savior, you dumb fuck,” part of him screamed. “Nobody will be around to care if you’re the last one alive.” Of course, Poland already was gone. There was no home for him, even the one he wouldn’t have wanted. Maybe he was the last Pole. But how would he change a light bulb?

For some reason, a series of bad Pollack jokes popped into Konarski’s head. There was a time when he would have taken great offense at such jokes, jumped to his country’s defense, maybe even thrown a few obligatory punches. But not now, not after what Poland had become over the last decade, and especially not after how they had behaved toward the end. They could go fuck themselves. And now they had. Or somebody bigger and badder had fucked them, just like had happened through most of their history.

Still, he felt a certain pride. Maybe he would be the start of a new, prouder race of Poles. No, that was just the sort of talk that had made him sick of his country, the reason he was commanding ISS under a Russian flag. Besides, there probably still were plenty of Poles around the world. He wasn’t alone. Yet.

If Alex watched Rome’s demise closely, he couldn’t be accused of exultation or cruel delight. He had watched his home city of Warsaw perish just three days earlier. Of course, it was nearly empty by the time the Front reached it. But he had listened to the broadcasts, the chatter, and he was ashamed of the conduct of his countrymen. They had acted just like the self-absorbed Western pigs he detested.

Ying understood. She was Chinese. When *they* left their old and infirm behind it would be from calculated expedience, not blind selfish panic. The decision would be institutional, not individual. The throng would push and perish and each would look to their own interest, but none would bear the individual moral responsibility. *That* would be absorbed by the State. What else was the State for?

But it turned out that his compatriots no longer thought this way. They had become soft since the fall of communism, soft and scared. When the moment came, they didn’t stand proud and sink with the ship. They scrambled over one another like a bunch of terrified mice, making a horrid mess and spitting on the morals of their homeland and a thousand years of national dignity just to buy a few more precious moments of lives clearly not worth living. They disgusted him. He would die the last true Pole.

In the meantime, he would carry on — his duty now to the species. Part of him felt that if *his* world had perished, so too should all the others. He harbored a certain resentment when he imagined some American scientists discovering the answer just in time to save their own country. It would be *his* data that accomplished this. What right had they to save themselves using *his* data, when his own people had perished. Yet still he sent it. Data that perhaps would one day allow another world to grow from the ashes of his. Maybe this was a sign that there *had* been some small progress over the thousands of years, that he was first and foremost human.

Alex’s thoughts were interrupted by a soft voice.

“We’re almost over Rome,” Ying whispered, breathing gently into his ear.

“C’mon, I have to record this,” he protested in half-genuine exasperation.

“That’s ok, we’ll just catch the next pass,” she shot back from behind him.

Alex heard some shuffling and felt something strange on his shoulder. What was Ying doing now? He had to focus, dammit. She was the funnest, craziest woman he had known, but sometimes he just wished he could lock her outside the station for a few hours. Yeah, he’d probably ask her to marry him at some point. Maybe soon. After all, living with somebody on the ISS was ten times more difficult than being married. Alex shook his shoulder free of her grip. It would have to wait.

Then he noticed that she wasn’t touching him. She was on the other side of the room, pointing at him with her mouth open. Why was there no sound? Then he was screaming, then he couldn’t scream anymore. Before things grew dark, he saw Ying’s decaying flesh. She still was pointing, almost like a mannequin. His last thought was how disgusting Ying had become, and that he soon would be the same.

CCSearch State Space Algo

While toying with automated Fantasy Sports trading systems, I ended up designing a rapid state search algorithm that was suitable for a variety of constrained knapsack-like problems.

A reference implementation can be found on github:

Below is a discussion of the algorithm itself. For more details, see the source code in the github repo. Also, please let me know if you come across any bugs! This is a quick and dirty implementation.

Here is a description of the algorithm:

— Constrained Collection Search Algorithm —

Here, we discuss a very efficient state-space search algorithm which originated with a Fantasy Sports project but is applicable to a broad range of applications. We dub it the Constrained Collection Search Algorithm for want of a better term. A C++ implementation, along with Python front-end is included as well.

In the Fantasy Sports context, our code solves the following problem: We’re given a tournament with a certain set of rules and requirements, a roster of players for that tournament (along with positions, salaries and other info supplied by the tournament’s host), and a user-provided performance measure for each player. We then search for those teams which satisfy all the constraints while maximizing team performance (based on the player performances provided). We allow a great deal of user customization and flexibility, and currently can accommodate (to our knowledge) all major tournaments on Draftkings and FanDuel. Through aggressive pruning, execution time is minimized.

As an example, on data gleaned from some past Fantasy Baseball tournaments, our relatively simple implementation managed to search a state space of size approximately {10^{21}} unconstrained fantasy teams, ultimately evaluating under {2} million plausible teams and executing in under {4} seconds on a relatively modest desktop computer.

Although originating in a Fantasy Sport context, the CCSearch algorithm and code is quite general.

— Motivating Example —

We’ll begin with the motivating example, and then consider the more abstract case. We’ll also discuss some benchmarks and address certain performance considerations.

In Fantasy Baseball, we are asked to construct a fantasy “team” from real players. While the details vary by platform and tournament, such games share certain common elements:

  • A “roster”. This is a set of distinct players for the tournament.
  • A means of scoring performance of a fantasy team in the tournament. This is based on actual performance by real players in the corresponding games. Typically, each player is scored based on certain actions (perhaps specific to their position), and these player scores then are added to get the team score.
  • For each player in the roster, the following information (all provided by the tournament host except for the predicted fantasy-points, which generally is based on the user’s own model of player performance):
  • A “salary”, representing the cost of inclusion in our team.
  • A “position” representing their role in the game.
  • One or more categories they belong to (ex. pitcher vs non-pitcher, real team they play on).
  • A prediction of the fantasy-points the player is likely to score.
  • A number {N} of players which constitute a fantasy team. A fantasy team must have precisely this number of players.
  • A salary cap. This is the maximum sum of player salaries we may “spend” in forming a fantasy team. Most, but not all, tournaments have one.
  • A set of positions, and the number of players in each. The players on our team must adhere to these. For example, we may have {3} players from one position and {2} from another and {1} each from {4} other positions. Sometimes there are “flex” positions, and we’ll discuss how to accommodate those as well. The total players in all the positions must sum to {N}.
  • Various other constraints on team formation. These come in many forms and we’ll discuss them shortly. They keep us from having too many players from the same real-life team, etc.

    To give a clearer flavor, let’s consider a simple example: Draftkings Fantasy Baseball. There are at least 7 Tournament types listed (the number and types change with time, so this list may be out of date). Here are some current game types. For each, there are rules for scoring the performance of players (depending on whether hitter or pitcher, and sometimes whether relief or starting pitcher — all of which info the tournament host provides):

    • Classic: {N=10} players on a team, with specified positions P,P,C,1B,2B,3B,SS,OF,OF,OF. Salary cap is $50K. Constraints: (1) {\le 5} hitters (non-P players) from a given real team, and (2) players from {\ge 2} different real games must be present.
    • Tiers: {N} may vary. A set of performance “tiers” is provided by the host, and we pick one player from each tier. There is no salary cap, and the constraint is that players from {\ge 2} different real games must be present.
    • Showdown: {N=6} players, with no position requirements. Salary cap is $50K. Constraints: (1) players from {\ge 2} different real teams, and (2) {\le 4} hitters from any one team.
    • Arcade: {N=6} players, with 1 pitcher, 5 hitters. Salary cap is $50K. Constraints are: (1) {\le 3} hitters (non-P players) from a given real team, and (2) players from {\ge 2} different real games must be present.
    • Playoff Arcade: {N=7} players, with 2 pitchers and 5 hitters. Salary cap is $50K. Constraints are: (1) {\le 3} hitters (non-P players) from a given real team, and (2) players from {\ge 2} different real games must be present.
    • Final Series (involves 2 games): {N=8} players, with 2 pitchers and 6 hitters. $50K salary cap. Constraints are: (1) {1} pitcher from each of the two games, (2) {3} hitters from each of the the {2} games, (3) can’t have the same player twice (even if they appear in both games), and (4) Must have hitters from both teams in each game.

    • Lowball: Same as Tiers, but the lowest score wins.

    Although the constraints above may seem quite varied, we will see they fall into two easily-codified classes.

    In the Classic tournament, we are handed a table prior to the competition. This contains a roster of available players. In theory there would be 270 (9 for each of the 30 teams), but not every team plays every day and there may be injuries so it can be fewer in practice. For each player we are given a field position (P,C,1B,2B,3B,SS,or OF), a Fantasy Salary, their real team, and which games they will play in that day. For our purposes, we’ll assume they play in a single game on a given day, though it’s easy to accommodate more than one.

    Let us suppose that we have a model for predicting player performance, and are thus also provided with a mean and standard deviation performance. This performance is in terms of “points”, which is Draftkings’ scoring mechanism for the player. I.e., we have a prediction for the score which Draftkings will assign the player using their (publicly available) formula for that tournament and position. We won’t discuss this aspect of the process, and simply take the predictive model as given.

    Our goal is to locate the fantasy teams which provide the highest combined predicted player scores while satisfying all the requirements (position, salary, constraints) for the tournament. We may wish to locate the top {L} such teams (for some {L}) or all those teams within some performance distance of the best.

    Note that we are not simply seeking a single, best solution. We may wish to bet on a set of 20 teams which diversify our risk as much as possible. Or we may wish to avoid certain teams in post-processing, for reasons unrelated to the constraints.

    It is easy to see that in many cases the state space is enormous. We could attempt to treat this as a knapsack problem, but the desire for multiple solutions and the variety of constraints make it difficult to do so. As we will see, an aggressively pruned direct search can be quite efficient.

    — The General Framework —

    There are several good reasons to abstract this problem. First, it is the sensible mathematical thing to do. It also offers a convenient separation from a coding standpoint. Languages such as Python are very good at munging data when efficiency isn’t a constraint. However, for a massive state space search they are the wrong choice. By providing a general wrapper, we can isolate the state-space search component, code it in C++, and call out to execute this as needed. That is precisely what we do.

    From the Fantasy Baseball example discussed (as well as the variety of alternate tournaments), we see that the following are the salient components of the problem:

    • A cost constraint (salary sum)
    • The number of players we must pick for each position
    • The selection of collections (teams) which maximize the sum of player performances
    • The adherence to certain constraints involving player features (hitter/pitcher, team, game)

    Our generalized tournament has the following components:

    • A number of items {N} we must choose. We will term a given choice of {N} items a “collection.”
    • A total cost cap for the {N} items.
    • A set of items, along with the following for each:
      • A cost
      • A mean value
      • Optionally, a standard deviation value
    • A set of features. Each feature has a set of values it may take, called “groups” here. For each feature, a table (or function) tells us which group(s), if any, each item is a member of. If every item is a member of one and only one group, then that feature is termed a “partition” for obvious reasons.
    • A choice of “primary” feature, whose role will be discussed shortly. The primary feature need not be a partition. Associated with the primary feature is a count for each group. This represents the number of items which must be selected for that group. The sum of these counts must be {N}. An item may be chosen for any primary group in which it is a member, but may not be chosen twice for a given collection.
    • A set of constraint functions. Each takes a collection and, based on the information above, accepts or rejects it. We will refer to these as “ancillary constraints”, as opposed to the overall cost constraint, the primary feature group allocation constraints, and the number of items per collection constraint. When we speak of “constraints” we almost always mean ancillary constraints.

    To clarify the connection to our example, the fantasy team is a collection, the players are items, the cost is the salary, the value is the performance prediction, the primary feature is “position” (and its groups are the various player positions), other features are “team” (whose groups are the 30 real teams), “game” (whose groups are the real games being played that day), and possibly one or two more which we’ll discuss below.

    Note that each item may appear only once in a given collection even if they theoretically appear can fill multiple positions (ex. they play in two games of a double-header or they are allowed for a “flex” position as well as their actual one in tournaments which have such things).

    Our goal at this point will be to produce the top {L} admissible collections by value (or a good approximation thereof). Bear in mind that an admissible collection is a set of items which satisfy all the criteria: cost cap, primary feature group counts, and constraint functions. The basic idea is that we will perform a tree search, iterating over groups in the primary feature. This is why that group plays a special role. However, its choice generally is a structural one dictated by the problem itself (as in Fantasy Baseball) rather than a control lever. We’ll aggressively prune where we can based on value and cost as we do so. We then use the other features to filter the unpruned teams via the constraint functions.

    It is important to note that features need not be partitions. This is true even of the primary feature. In some tournaments, for example, there are “utility” or “flex” positions. Players from any other position (or some subset of positions) are allowed for these. A given player thus could be a member of one or more position groups. Similarly, doubleheaders may be allowed, in which case a player may appear in either of 2 games. This can be accommodated via a redefinition of the features.

    In most cases, we’ll want the non-primary features to be partitions if possible. We may need some creativity in defining them, however. For example, consider the two constraints in the Classic tournament described above. Hitter vs pitcher isn’t a natural feature. Moreover, the constraint seems to rely on two distinct features. There is no rule against this, of course. But we can make it a more efficient single-feature constraint by defining a new feature with 31 groups: one containing all the pitchers from all teams, and the other 30 containing hitters from each of the 30 real teams. We then simply require that there be no more than 5 items in any group of this new feature. Because only 2 pitchers are picked anyway, the 31st group never would be affected.

    Our reference implementation allows for general user-defined constraints via a functionoid, but we also provide two concrete constraint classes. With a little cleverness, these two cover all the cases which arise in Fantasy Sports. Both concern themselves with a single feature, which must be a partition:

    • Require items from at least {n} groups. It is easy to see that the {\ge 2} games and {\ge 2} teams constraints fit this mold.
    • Allow at most {n} items from a given group. The {\le 3,4,5} hitter per team constraints fit this mold.

    When designing custom constraints, it is important to seek an efficient implementation. Every collection which passes the primary pruning will be tested against every constraint. Pre-computing a specialized feature is a good way to accomplish this.

    — Sample Setup for DraftKings Classic Fantasy Baseball Tournament —

    How would we configure our system for a real application? Consider the Classic Fantasy Baseball Tournament described above.

    The player information may be provided in many forms, but for purposes of exposition we will assume we are handed vectors, each of the correct length and with no null or bad values. We are given the following:

    • A roster of players available in the given tournament. This would include players from all teams playing that day. Each team would include hitters from the starting lineup, as well as the starting pitcher and one or more relief pitchers. We’ll say there are {M} players, listed in some fixed order for our purposes. {R_i} denotes player {i} in our listing.
    • A set {G} of games represented in the given tournament. This would be all the games played on a given day. Almost every team plays each day of the season, so this is around 15 games. We’ll ignore the 2nd game of doubleheaders for our purposes (so a given team and player plays at most once on a given day).
    • A set {T} of teams represented in the given tournament. This would be all 30 teams.
    • A vector {p} of length {M}, identifying the allowed positions of each player. These are P (pitcher), C (catcher), 1B (1st base), 2B (2nd base), 3B (3rd base), SS (shortstop), OF (outfield).
    • A vector {t} of length {M}, identifying the team of each player. This takes values in {T}.
    • A vector {g} of length {M}, identifying the game each player participates in that day. This takes value in {G}.
    • A vector {s} of length {M}, providing the fantasy salary assigned by DraftKings to each player (always positive).
    • A vector {v} of length {M}, providing our model’s predictions of player performance. Each such value is the mean predicted fantasy score for the player under DraftKing’s scoring system for that tournament and player position. As an aside, DK never scores pitchers as hitters even if they bat.

    Note that DraftKings provides all this info (though they may have to be munged into some useable form), except the model prediction.

    We now define a new vector {h} of length {M} as follows: {h_i=t_i} if player {i} is a hitter (i.e. not a pitcher), and {h_i=P} if a pitcher, where {P} designates some new value not in {T}.

    Next, we map the values of {G}, {T}, and the positions into nonnegative consecutive integers (i.e. we number them). So the games run from {1\dots |G|}, the teams from {1\dots |T|}, and the positions from {1\dots 7}. We’ll assign {0} to the pitcher category in the {h} vector. The players already run from {1\dots M}. The vectors {t}, {g}, {s}, and {h} now take nonnegative integer values, while {s} and {v} take real ones (actually {s} is an integer too, but we don’t care here).

    From this, we pass the following to our algorithm:

    • Number of items: {M}
    • Size of a collection: {10}
    • Feature 1: {7} groups (the positions), and marked as a partition.
    • Feature 2: {|T|} groups (the teams), and marked as a partition.
    • Feature 3: {|G|} groups (the games), and marked as a partition.
    • Feature 4: {|T|+1} groups (the teams for hitters plus a single group of all pitchers), and marked as a partition.
    • Primary Feature: Feature 1
    • Primary Feature Group Counts: {(2,1,1,1,1,1,3)} (i.e. P,P,C,1B,2B,3B,SS,OF,OF,OF)
    • Item costs: {s}
    • Item values: {v}
    • Item Feature 1 Map: {f(i,j)= \delta_{p_i,j}} (i.e. {1} if player {i} is in position {j})
    • Item Feature 2 Map: {f(i,j)= \delta_{t_i,j}} (i.e. {1} if player {i} is on team {j})
    • Item Feature 3 Map: {f(i,j)= \delta_{g_i,j}} (i.e. {1} if player {i} is in game {j})
    • Item Feature 4 Map: {f(i,j)= \delta_{h_i,j}} (i.e. {1} if player {i} is a hitter on team {j} or a pitcher and {j=0})
    • Cost Cap: {50,000}
    • Constraint 1: No more than {5} items in any one group of Feature 4. (i.e. {\le 5} hitters from a given team)
    • Constraint 2: Items from at least {2} groups of Feature 3. (i.e. items from {\ge 2} games)

    Strictly speaking, we could have dispensed with Feature 2 in this case (we really only need the team through Feature 4), but we left it in for clarity.

    Note that we also would pass certain tolerance parameters to the algorithm. These tune its aggressiveness as well as the number of teams potentially returned.

    — Algorithm —

    — Culling of Individual Items —

    First, we consider each group of the primary feature and eliminate strictly inferior items. These are items we never would consider picking because there always are better choices. For this purpose we use a tolerance parameter, {epsilon}. For a given group, we do this as follows. Assume that we are required to select {n} items from this group:

    • Restrict ourselves only to items which are unique to that group. I.e., if an item appears in multiple groups it won’t be culled.
    • Scan the remaining items in descending order of value. For item {i} with cost {c} and value {v},
      • Scan over all items {j} with {v_j>v_i(1+\epsilon)}
      • If there are {n} such items that have {c_j\le c_i} then we cull item {i}.

    So basically, it’s simple comparison shopping. We check if there are enough better items at the same or lower cost. If so, we never would want to select the item. We usually don’t consider “strictly” better, but allow a buffer. The other items must be sufficiently better. There is a rationale behind this which will be explained shortly. It has to do with the fact that the cull stage has no foreknowledge of the delicate balance between ancillary constraints and player choice. It is a coarse dismissal of certain players from consideration, and the tolerance allows us to be more or less conservative in this as circumstance dictates.

    If a large number of items appear in multiple groups, we also can perform a merged pass — in which those groups are combined and we perform a constrained cull. Because we generally only have to do this with pairs of groups (ex. a “flex” group and each regular one), the combinatorial complexity remains low. Our reference implementation doesn’t include an option for this.

    To see the importance of the initial cull, consider our baseball example but with an extra 2 players per team assigned to a “flex” position (which can take any player from any position). We have {8} groups with ({60},{30},{30},{30},{30},{30},{90},{270}) allowed items. We need to select {(2,1,1,1,1,1,3,2)} items from amongst these. In reality, fantasy baseball tournaments with flex groups have fewer other groups — so the size isn’t quite this big. But for other Fantasy Sports it can be.

    The size of the overall state space is around {5x10^{21}}. Suppose we can prune just 1/3 of the players (evenly, so 30 becomes 20, 60 becomes 40, and 90 becomes 60). This reduces the state space by 130x to around {4x10^19}. If we can prune 1/2 the players, we reduce it by {4096x} to around {10^18}. And if we can prune it by 2/3 (which actually is not as uncommon as one would imagine, especially if many items have {0} or very low values), we reduce it by {531441x} to a somewhat less unmanageable starting point of {O(10^{16})}.

    Thus we see the importance of this initial cull. Even if we have to perform a pairwise analysis for a flex group — and each paired cull costs {n^2m^2} operations (it doesn’t), where {m} is the non-flex group size and {n} is the flex-group size, we’d at worst get {(\sum_i m_i)^2\sum_i m_i^2} which is {O(10^9)} operations. In reality it would be far lower. So a careful cull is well worth it!

    One important word about this cull, however. It is performed at the level of individual primary-feature groups. While it accommodates the overall cost cap and the primary feature group allocations, it has no knowledge of the ancillary constraints. It is perfectly possible that we cull an item which could be used to form the highest value admissible collection once the ancillary constraints are taken into account. This is part of why we use the tolerance {\epsilon}. If it is set too high, we will cull too few items and waste time down the road. If it is too low, we may run into problems meeting the ancillary constraints.

    We note that in fantasy sports, the ancillary constraints are weak in the sense that they affect a small set of collections and these collections are randomly distributed. I.e, we would have to conspire for them to have a meaningful statistical effect. We also note that there tend to be many collections within the same tiny range of overall value. Since the item value model itself inherently is statistical, the net effect is small. We may miss a few collections but they won’t matter. We’ll have plenty of others which are just as good and are as statistically diverse as if we included the omitted ones.

    In general use, we may need to be more careful. If the ancillary constraints are strong or statistically impactful, the initial cull may need to be conducted with care. Its affect must be measured and, in the worst case, it may need to be restricted or omitted altogether. In most cases, a well-chosen {\epsilon} will achieve the right compromise.

    In practice, {\epsilon} serves two purposes: (1) it allows us to tune our culling so that the danger of an impactful ommission due to the effect of ancillary, yet we still gain some benefit from this step, and (2) it allows us to accommodate “flex” groups or other non-partition primary features without a more complicated pairwise cull. This is not perfect, but often can achieve the desired effect with far less effort.

    Another approach to accommodating flex groups or avoiding suboptimal results due to the constraints is to require more than the selection count when culling in a given group. Suppose we need to select {2} items from a given group. Ordinarily, we would require that there be at least {2} items with value {(1+\epsilon)v} and cost {\le c} in order to cull an item with value {v} and cost {c}. We could buffer this by requiring {3} or even {4} such better items. This would reduce the probability of discarding useful items, but at the cost of culling far fewer. In our code, we use a parameter {ntol} to reflect this. If {n_i} is the number of selected items for group {i} (and the number we ordinarily would require to be strictly better in order to cull others), we now require {n_i+ntol} strictly better items. Note that {ntol} solely is used for the individual cull stage.

    One final note. If a purely lossless search is required then the individual cull must be omitted altogether. In the code this is accomplished by either choosing {ntol} very high or {\epsilon} very high. If we truly require the top collection (as opposed to collections within a thick band near the top), we have the standard knapsack problem and there are far better algorithm than CCSearch.

    — Prepare for Search —

    We can think of our collection as a selection of {n_i} items from each primary-feature group {i} (we’ll just refer to it as “group” for short). Let’s say that {m_i} is the total number of items in the {i^{th}} group. Some of the same items may be available to multiple groups, but our collection must consist of distinct items. So there are {K} bins, the number of primary feature groups. For the {i^{th}} such group, we select {n_i} items from amongst the available {m_i} post-cull items.

    For the search itself we iterate by group, then within each group. Conceptually, this could be thought of as a bunch of nested loops from left group to right group. In practice, it is best implemented recursively.

    We can precompute certain important information:

    • Each group has {C_i= {m_i\choose n_i}} possible selections. We can precompute this easily enough.
    • We also can compute {RC_i= \Pi_{j\ge i} C_i}. I.e. the product of the total combinations of this group and those that come after.
    • {BV_i} is the sum of the top {n_i} values in the group. This is the best we can do for that group, if cost is no concern.
    • {RBV_i} is {\sum_{j>i} BV_i}. I.e., the best total value we can get from all subsequent groups.
    • {LC_i} is the sum of the bottom {n_i} costs in the group. This is the cheapest we can do for that group, if value is no concern.
    • {RLC_i} is {\sum_{j>i} LC_i}. I.e., the cheapest we can do for all subsequent groups, if value is no concern.
    • Sorted lists of the items by value and by cost.
    • Sorted lists of {n_i}-tuples of distinct items by overall value and by overall cost. I.e., for each group, sorted lists of all combos of {n_i} choices. These generally are few enough to keep in memory.

    The search itself depends on two key iteration decisions. We discuss their effects on efficiency below.

    • Overall, do we scan the groups from fewest to most combinations (low to high {C_i}) or from most to fewest (high to low {C_i})?
    • Within each group, do we scan the items from lowest to highest cost or from highest to lowest value. Note that of the four possible combinations, the other two choices make no sense. It must be one of these.

    Based on our choice, we sort our groups, initialize our counters, and begin.

    — Search —

    We’ll describe the search recursively.

    Suppose we find ourselves in group {i}, and are given the cost {c} and value {v} so far (from the selections for groups {1\dots i-1}). We also are given {vmin}, the lowest collection value we will consider. We’ll discuss how this is obtained shortly.

    We need to cycle over all {C_i} choices for group {i}. We use the pre-sorted list of {n_i-tuples} sorted by value or by cost depending on our 2nd choice above. I.e., we are iterating over the possible selections of {n_i} items in decreasing order of overall value or increasing order of overall cost.

    We now discuss the individual iteration. For each step we compute the following:

    • {mc} is the minimum cost of all remaining groups ({i+1} onward). This is the lowest cost we possibly could achieve for subsequent groups. It is the pre-computed {RLC_i} from above.
    • {mv} is the maximum value of all remaining groups ({i+1} onward). This is the highest value we possibly could achieve for subsequent groups. It is the pre-computed {RBV_i} from above.
    • {c_i} is the cost of our current selection for group {i}
    • {v_i} is the value of our current selection for group {i}

    Next we prune if necessary. There are 2 prunings, the details of which depend on the type of iteration.

    If we’re looping in increasing order of cost:

    • If {c+c_i+mc>S} then there is no way to select from the remaining groups and meet the cost cap. Worse, all remaining iterations within group {i} will be of equal or higher cost and face the same issue. So we prune both the current selection and all remaining ones. Practically, this means we terminate the iteration over combinations in group {i} (for this combo of prior groups).
    • If {v+v_i+mv<vmin} then there is no way to select a high enough value collection from the remaining groups. However, it is possible that other iterations may do so (since we’re iterating by cost, not value). We prune just the current selection, and move on to the next combo in group {i} by cost.

    If on the other hand we’re looping in decreasing order of value, we do the opposite:

    • If {v+v_i+mv<vmin} then there is no way to select a high enough value collection from the remaining groups. Worse, all remaining iterations within group {i} will be of equal or lower value and face the same issue. So we prune both the current selection and all remaining ones. Practically, this means we terminate the iteration over combinations in group {i} (for this combo of prior groups).
    • If {c+c_i+mc>S} then there is no way to select from the remaining groups and meet the cost cap. However, it is possible that other iterations may do so (since we’re iterating by value, not cost). We prune just the current selection, and move on to the next combo in group {i} by value.

    If we get past this, our combo has survived pruning. If {i} isn’t the last group, we recursively call ourselves, but now with cost {c+c_i} and value {v+v_i} and group {i+1}.

    If on the other hand, we are the last group, then we have a completed collection. Now we must test it.

    If we haven’t put any protections against the same item appearing in different slots (if it is in multiple groups), we must test for this and discard the collection if it is. Finally, we must test it against our ancillary constraints. If it violates any, it must be discarded. What do we do with collections that pass muster? Well, that depends. Generally, we want to limit the number of collections returned to some number {NC}. We need to maintain a value-sorted list of our top collections in a queue-like structure.

    If our new collection exceeds all others in value, we update {vmax}, the best value realized, This also resets {vmin= vmax (1-\delta)} for some user-defined tolerance {\delta}. We then must drop any already-accumulated collections which fall below the new {vmin}.

    I.e., we keep at most {NC} collections, and each must have value within a fraction {\delta} of the best.

    And that’s it.

    — Tuning —

    Let’s list all the user-defined tunable parameters and choices in our algorithm:

    • What is the individual cull tolerance {\epsilon\in [0,\infty]}?
    • What is {ntol}, the number of extra strictly-better items we require in a group during the individual cull?
    • Do we scan the groups from fewest to most combinations or the other way?
    • Within each group, do we scan the items from lowest to highest cost or from highest to lowest value?
    • What is the maximum number of collections {NC>0} we report back (or do we keep them all)?
    • What is the collection value tolerance {\delta\in [0,1]}?

    Clearly, {NC} and {\delta} guide how many results are kept and returned. High {NC} and high {\delta} are burdensome in terms of storage. If we want just the best result, either {NC=1} or {\delta=1} will do. As mentioned, {\epsilon} and {ntol} have specific uses related to the behavior of the individual cull. What about the sort orders?

    The details of post-cull search performance will depend heavily on the primary partition structure and cost distribution, as well as our 2 search order choices. The following is a simple test comparison benchmark (using the same data and the {10}-player collection Classic Fantasy Baseball tournament structure mentioned above).

    GroupOrder CombosOrder Time Analyzed
    Value high-to-low high-to-low 12.1s 7.9MM
    Value high-to-low low-to-high 3.4s 1.5MM
    Cost low-to-high high-to-low 69.7s 47.5MM
    Cost low-to-high low-to-high 45.7s 18.5MM

    Here, “Analyzed” refers to the number of collections which survived pruning and were tested against the ancillary constraints. The total number of combinations pruned was far greater.

    Of course, these numbers mean nothing in an absolute sense. They were run with particular test data on a particular computer. But the relative values are telling. For these particular conditions, the difference between the best and worst choice of search directions was over {20x}. There is good reason to believe that, for any common tournament structure, the results would be consistent once established. It also is likely they will reflect these. Why? The fastest option allows the most aggressive pruning early in the process. That’s why so few collections needed to be analyzed.

  • Two-Envelope Problems

    Let’s visit a couple of fun and extremely counterintuitive problems which sit in the same family. The first appears to be a “paradox,” and illustrates a subtle fallacy. The second is an absolutely astonishing (and legitimate) algorithm for achieving better than 50-50 oods of picking the higher of two unknown envelopes. Plenty of articles have discussed who discovered what ad nauseum so we’ll just dive into the problems.

    — The Two Envelope Paradox: Optimizing Expected Return —

    First, consider the following scenario. Suppose you are shown two identical envelopes, each containing some amount of money unknown to you. You are told that one contains double the money in the other (but not which is which or what the amounts are) and are instructed to choose one. The one you select is placed in front of you and its contents are revealed. You then are given a second choice: keep it or switch envelopes. You will receive the amount in the envelope you choose. Your goal is to maximize your expected payment.

    Our intuition tells us that no information has been provided by opening the envelope. After all, we didn’t know the two values beforehand so learning one of them tells us nothing. The probability of picking the higher envelope should be {1/2} regardless of whether we switch or not. But you weren’t asked to improve on the probability, just to maximize your expected payment. Consider the following 3 arguments:

    • Let the amount in the the envelope you initially chose be {z}. If it is wrong to switch then the other envelope contains {z/2}, but if it is right to switch it contains {2z}. There are even odds of either, so your expectation if you switch is {1.25z}. This is better than the {z} you get by sticking with the initial envelope, so it always is better to switch!
    • Since we don’t know anything about the numbers involved, opening the first envelope gives us no information — so ignore that value. Call the amount in the other envelope {z'}. If it is wrong to switch then the envelope you chose contains {2z'}, and if right to switch it contains {0.5z'}. If you switch, you get {z'} but if you don’t your expectation is {1.25z'}. So it always is better NOT to switch!
    • Call the amounts in the two envelopes {x} and {2x} (though you don’t know which envelope contains which). You pick one, but there is equal probability of it being either {x} or {2x}. The expected reward thus is {1.5x}. If you switch, the same holds true for the other envelope. So you still have an expected reward of {1.5x}. It doesn’t matter what you do.

    Obviously, something is wrong with our logic. One thing that is clear is that we’re mixing apples and oranges with these arguments. Let’s be a bit more consistent with our terminology. Let’s call the value that is in the opened envelope {z} and the values in the two envelopes {x} and {2x}. We don’t know which envelope contains each, though. When we choose the first envelope, we observe a value {z}. This value may be {x} or {2x}.

    In the 3rd argument, {P(z=x)= P(z=2x)= 0.5}. If we switch, then {\langle V \rangle= P(z=x)2x+P(z=2x)x = 1.5x}. If we keep the initial envelope then {\langle V \rangle= P(z=x)x+P(z=2x)2x = 1.5x}. Whether we switch or not, the expected value is {1.5x} though we do not know what this actually is. It could correspond to {1.5z} or {0.75z}. We must now draw an important distinction. It is correct that {P(z=x)= P(z=2x)= 0.5} for the known {z} and given our definition of {x} as the minimum of the two envelopes. However, we cannot claim that {1.5x} is {1.5z} or {0.75z} with equal probability! That would be tantanmount to claiming that the envelopes contain the pairs {(z/2,z)} or {(z,2z)} with equal probability. We defined {x} to be the minimum value so the first equality holds, but we would need to impose a constraint on the distribution over that minimum value itself in order for the second one to hold. This is a subtle point and we will return to it shortly. Suffice it to say that if we assume such a thing we are led right to the same fallacy the first two arguments are guilty of.

    Obviously, the first two arguments can’t both be correct. Their logic is the same and therefore they must both be wrong. But how? Before describing the problems, let’s consider a slight variant in which you are NOT shown the contents of the first envelope before being asked to switch. It may seem strange that right after you’ve chosen, you are given the option to switch when no additional information has been presented. Well, this really is the same problem. With no apriori knowledge of the distribution over {x}, it is immaterial whether the first envelope is opened or not before the 2nd choice is made. This gives us a hint as to what is wrong with the first two arguments.

    There actually are two probability distributions at work here, and we are confounding them. The first is the underlying distribution on ordered pairs or, equivalently, the distribution of the lower element {x}. Let us call it {P(x)}. It determines which two numbers {(x,2x)} we are dealing with. We do not know {P(x)}.

    The second relevant distribution is over how two given numbers (in our case {(x,2x)}) are deposited in the envelopes (or equivalently, how the player orders the envelopes by choosing one first). This distribution unambiguously is 50-50.

    The problem arises when we implicitly assume a form for {P(x)} or attempt to infer information about it from the revealed value {z}. Without apriori knowledge of {P(x)}, being shown {z} makes no difference at all. Arguments which rely solely on the even-odds of the second distribution are fine, but arguments which implicitly involve {P(x)} run into trouble.

    The first two arguments make precisely this sort of claim. They implicitly assume that the pairs {(z/2,z)} or {(z,2z)} can occur with equal probability. Suppose they couldn’t. For simplicity (and without reducing the generality of the problem), let’s assume that the possible values in the envelopes are constrained to {2^n} with {n\in Z}. The envelopes thus contain {(2^n,2^{n+1})} for some integer {n} (though we don’t know which envelope contains which value). For convenience, let’s work in terms of {log_2} of the values involved (taking care to use {2^n} when computing expectations).

    In these terms, the two envelopes contain {(n,n+1)} for some {n=\log_2(x)} (defined to be the lesser of the two). We open one, and see {m=\log_2(z)}. If it is the upper then the pair is {(m-1,m)}, otherwise the pair is {(m,m+1)}. To claim that these have equal probabilities means that {n=m-1} and {n=m} are equally probable. We made this assumption independent of the value of {m}, so it would require that all pairs {(n,n+1)} be equally probable.

    So what? Why not just assume a uniform distribution? Well, for one thing, we should be suspicious that we require an assumption about {P(x)}. The 3rd argument requires no such assumption. Even if we were to assume a form for {P(x)}, we can’t assume it is uniform. Not just can’t as in “shouldn’t”, but can’t as in “mathematically impossible.” It is not possible to construct a uniform distribution on {Z}.

    Suppose we sought to circumvent this issue by constraining ourselves to some finite range {[M,N]}, which we supposedly know or assume apriori. We certainly can impose a uniform distribution on it. Each pair {(n,n+1)} has probability {1/(N-M-1)} with {n\in [M,N-1]}. But now we’ve introduced additional information (in the form of {N} and {M}), and it no longer is surprising that we can do better than even-odds! We always would switch unless the first envelope contained {N}. There is no contradiction between the first two arguments because we have apriori knowledge and are acting on it. We no longer are true to the original game.

    Rather than dwell on this particular case, let’s solve the more general case of a given {P(x)} (or in terms of {log_2}, {P(n)}). For any {n} drawn according to {P(n)}, the envelopes contain {(n,n+1)} in some order and it is equally likely that {m=n} and {m=n+1}. If we know {P} we can bet accordingly since it contains information. In that case, knowing {m} (i.e. {z}) helps us. Let’s suppose we don’t know {P}. Then it still does not matter whether we observe the value {z}, because we don’t the know the underlying distribution!

    There only are two deterministic strategies: always keep, always switch. Why? Suppose that the drawn value is {n} (unknown to us) and the observed value is {m}. Note that these don’t require actual knowledge of the {m} value, just that it has been fixed by the process of opening the envelope. Since we don’t know the underlying distribution, our strategy will be independent of the actual value. Given that the value doesn’t matter, we have nothing to do but always keep or always switch.

    First consider the expected value with the always-keep strategy:

    \displaystyle \langle V_K \rangle= \sum_{n=-\infty}^\infty P(n) [P(m=n|n) 2^n + P(m=n+1|n) 2^{n+1}]

    I.e. we sum over all possible ordered pairs {(n,n+1)} and then allow equal probability {P(m=n+1|n)=P(m=n|n)=0.5} for either of the two envelope orders. So we have {\langle V_K \rangle= \sum P(n) (2^n+2^{n+1})/2 = 3 \langle 2^{n-1} \rangle}. We immediately see that for this to be defined the probability distribution must drop faster than {2^n} as {n} gets large! We already have a constraint on the possible forms for {P}.

    Next consider the always-switch strategy. It’s easy to see that we get the same result:

    \displaystyle \langle V_S \rangle= \sum_{n=-\infty}^\infty P(n) [P(m=n|n) 2^{n+1} + P(m=n+1|n) 2^{n}]

    and since {P(m=n|n)= P(m=n+1|n)} we get the same answer.

    But let’s be extra pedantic, and connect this to the original formulation of the first two arguments. I.e., we should do it in terms of {m}, the observed value.

    \displaystyle \langle V_S \rangle= \sum_m P(m) [P(n=m|m) 2^{m+1} + P(n=m-1|m) 2^{m-1}]

    We observe that {P(n=m|m)= P(m|n=m)P(n=m)/P(m)} and {P(n=m-1|m)= P(m|n=m-1)P(n=m-1)/P(m)}. We know that {P(m|n=m)= P(m|n=m-1)= 0.5}. Plugging these in, we get

    \displaystyle \langle V_S \rangle= \sum_m [0.5 P(n=m) 2^{m+1} + 0.5 P(n=m-1) 2^{m-1}]

    The first term gives us {\sum_n P(n) 2^n}. We can rewrite the index on the 2nd sum to get {\sum_n P(n) 2^{n-1}}, which gives us {\langle V_S \rangle= \sum_m P(n) (2^n + 2^{n-1})}, the exact same expression as before!

    How does this apply to the {[M,N]} ranged example we gave before? When we discussed it, we considered the case where the underlying distribution was known. In that and all other cases, a better than even-odds strategy based on such knowledge can be computed. In our actual formulation of the game, we don’t know {P(n)} and there’s no reason it couldn’t be uniform on some unknown interval {[M,N]}. Suppose it was. It still seems from our earlier discussion as if we’d do better by always switching. We don’t. The average amount thrown away by incorrectly switching when {m=N} exactly offsets the average gain from switching in all other cases. We do no better by switching than by keeping.

    We thus see that without knowing the underlying distribution {P(x)}, the switching and keeping strategies have the same expected reward. Of the three arguments we originally proposed, the first 2 were flawed in that they assume a particular, and impossible, underlying distribution for {x}.

    At the beginning of our discussion, we mentioned that our intuition says you cannot do better than 50-50 probability-wise. Let us set aside expected rewards and focus solely on probabilities. We now see how you actually can do better than 50-50, contrary to all intuition!

    — Achieving better than 50-50 Odds with Two Envelopes —

    Next let’s consider a broader class of two-envelope problems, but purely from the standpoint of probabilities. Now the two envelopes can contain any numbers; one need not be double the other. As before, we may choose an envelope, it is opened, and we are offered the opportunity to keep it or switch. Unlike before, our goal now is to maximize the probability of picking the larger envelope.

    Since we are dealing with probabilities rather than expectation values, we don’t care what two numbers the envelopes contain. In fact, they need not be numbers at all — as long as they are distinct and comparable (i.e. {a<b} or {b<a} but not both). To meaningfully analyze the problem we require a slightly stronger assumption, though: specifically that the set from which they be drawn (without repetition) possesses a strict linear ordering. However, it need not even possess any algebraic structure or a metric. Since we are not concerned with expectation values, no such additional structure is necessary.

    Our intuition immediately tells us that nothing can be gained by switching. In fact, nothing we do should have any impact on the outcome. After all, the probability of initially picking correctly is {1/2}. Switching adds no information and lands us with an identical {1/2} probability. And that is that, right? It turns out that, contrary to our very strong intuition about the problem, there is in fact a way to improve those odds. To accomplish this, we’ll need to introduce a source of randomness. For convenience of exposition we’ll assume the envelopes contain real numbers, and revisit the degree to which we can generalize the approach later.

    The procedure is as follows:

    • Pick any continuous probability distribution {P} which has support on all of {R} (i.e. {p(x)>0} for all real {x}). Most common distributions (normal, beta, exponential, etc) are fine.
    • Choose an envelope and open it. We’ll denote its value {z}.
    • Sample some value {d} from our distribution {P}. If {z>d} stick with the initial choice, otherwise switch. We’ll refer to {z>d} or {z<d} because the probability that {z=d} has measure {0} and safely can be ignored.

    At first, second, and {n^{th}} glance, this seems pointless. It feels like all we’ve done is introduce a lot of cruft which will have no effect. We can go stand in a corner flipping a coin, play Baccarat at the local casino, cast the bones, or anything else we want, and none of that can change the probability that we’re equally likely to pick the lower envelope as the higher one initially — and thus equally likely to lose as to gain by switching. With no new information, there can be no improvement. Well, let’s hold that thought and do the calculation anyway. Just for fun.

    First some terminology. We’ll call the value in the opened envelope {z}, and the value in the other envelope {z'}. The decision we must make is whether to keep {z} or switch to the unknown {z'}. We’ll denote by {x} and {y} the values in the two envelopes in order. I.e., {x<y} by definition. In terms of {z} and {z'} we have {x= \min(z,z')} and {y= \max(z,z')}. We’ll denote our contrived distribution {P} in the abstract, with pdf {p(v)} and cdf {F(v)=\int_{-\infty}^v p(v') dv'}.

    Let’s examine the problem from a Bayesian perspective. There is a 50-50 chance that {(z,z')=(x,y)} or {(z,z')=(y,x)}. So {p(z=x)=p(z=y)=0.5}. There are no subtleties lurking here. We’ve assumed nothing about the underlying distribution over {(x,y)}. Whatever {(x,y)} the envelopes contain, we are equally likely to initially pick the one with {x} or the one with {y}.

    Once the initial envelope has been opened, and the value {z} revealed, we sample {d} from our selected distribution {P} and clearly have {p(d<x)=F(x)} and {p(d<y)=F(y)} and {p(d<z)=F(z)}. The latter forms the criterion by which we will keep {z} or switch to {z'}. Please note that in what follows, {d} is not a free variable, but rather a mere notational convenience. Something like {p(x<d)} is just notation for “the probability the sampled value is greater than {x}.” We can apply Bayes’ law to get (with all probabilities conditional on some unknown choice of {(x,y)}):

    \displaystyle p(z=x|d<z)= \frac{p(d<z|z=x)p(z=x)}{p(d<z)}

    What we really care about is the ratio:

    \displaystyle \frac{p(z=x | d<z)}{p(z=y | d<z)}= \frac{p(d<z|z=x)p(z=x)}{p(d<z|z=y)p(z=y)}= \frac{F(x)}{F(y)}<1

    Here, we’ve observed that {p(d<z|z=x)= p(d<x)= F(x)} and {F(x)<F(y)} since by assumption {x<y} and {F} is monotonically increasing (we assumed its support is all of {R}). I.e., if {d<z} there is a greater probability that {z=y} than {z=x}. We shouldn’t switch. A similar argument shows we should switch if {d>z}.

    So what the heck has happened, and where did the new information come from? What happened is that we actually know one piece of information we had not used: that the interval {(x,y)} has nonzero probability measure. I.e. there is some “space” between {x} and {y}. We don’t know the underlying distribution but we can pretend we do. Our strategy will be worse than if we did know the underlying {p(x)}, of course. We’ll return to this shortly, but first let’s revisit the assumptions which make this work. We don’t need the envelopes to contain real numbers, but we do require the following of the values in the envelopes:

    • The set of possible values forms a measurable set with a strict linear ordering.
    • Between any two elements there is a volume with nonzero probability. Actually, this only is necessary if we require a nonzero improvement for any {(x,y)}. If we only require an improvement on average we don’t need it. But in that scenario, the host can contrive to use a distribution which neutralizes our strategy and returns us to 50-50 odds.

    What difference does {P} itself make? We don’t have any way to choose an “optimal” distribution because that would require placing the bulk of probability where we think {x} and {y} are likely to lie. I.e. we would require prior knowledge. All we can guarantee is that we can improve things by some (perhaps tiny) amount. We’ll compute how much (for a given true underlying distribution) shortly.

    Let’s assume that {Q(x,y)} is the true underlying distribution over {(x,y)}. We won’t delve into what it means to “know” {Q} since we are handed the envelopes to begin with. Perhaps the game is played many times with values drawn according to {Q} or maybe it is a one-time affair with {(x,y)} fixed (i.e. {Q} a {\delta}-distribution). Ultimately, such considerations just would divert us to the standard core philosophical questions of probability theory. Suffice to say that there exists some {Q(x,y)}. By definition {Q(x,y)=0} unless {x<y}. For convenience, we’ll define a symmetrized version as well: {q(a,b)\equiv Q(a,b)+Q(b,a)}. We don’t employ a factor of {1/2} since the two terms are nonzero on disjoint domains.

    Given {Q}, what gain do we get from a particular choice of {P}?

    \displaystyle  \begin{array}{rcl}  P(win)= \int_{x<y} dx dy Q(x,y)[p(z=x|(x,y))p(x<d) \\ + p(z=y|(x,y))p(d<y)] \end{array}

    I.e., the probability we keep {z} when it is {y} and switch when it is {x}. Clearly, {p(z=x|(x,y))= p(z=y|(x,y))= 0.5} since those are the immutable 50-50 envelope ordering probabilities. After a little rearrangement, we get:

    \displaystyle P(win)= \frac{1}{2} + \langle F(y) - F(x) \rangle_Q

    Our gain is the mean value of {F(y)-F(x)} over the joint distribution {Q(x,y)}. The more probability {P} jams between {x} and {y}, the more we gain should that {(x,y)} arise. But without knowledge of the underlying joint distribution {Q(x,y)}, we have no idea how best to pick {P}. All we can do is guarantee some improvement.

    How well can we do if we actually know {Q}? Well, there are two ways to use such information. We could stick to our strategy and try to pick an optimal {P}, or we could seek to use knowledge of {Q} directly. In order to do the former, we need to exercise a little care. {Q} is a two-dimensional distribution while {P} is one-dimensional. How would we use {Q} to pick {P}? Well, this is where we make use of the observed {z}.

    In our previous discussion of the {(x,2x)} envelope switching fallacy, the value of {z} turned out to be a red-herring. Here it is not. Observing {z} is essential here, but only for computation of probabilities. As mentioned, we assume no algebraic properties and are computing no expectations. We already know that the observation of {z} is critical, since our algorithm pivots on a comparison between {z} and our randomly sampled value {d}. Considering our ultimate goal (keep or switch), it is clear what we need from {Q}: a conditional probability that {z'>z}. However, we cannot directly use {Q(y|x)} because we defined {x<y}. We want {p(z'|z)} and we don’t know whether {z<z'} or {z'<z}. Let’s start by computing the probability of {z} (being the observed value) and of {z,z'} (being the observed and unobserved values).

    The probability of observing {z} and the other envelope having {z'} is the probability that the relevant ordered pair was chosen for the two envelopes multiplied by the {1/2} probability that we initially opened the envelop containing the value corresponding to our observed {z} rather than the other one.

    \displaystyle p(z,z')= Q(min(z,z'),max(z,z'))/2= q(z,z')/2

    To get {p(z)} we integrate this. {p(z)= \frac{1}{2}\int Q(z,y)dy + \frac{1}{2}\int Q(x,z)dz}. This is a good point to introduce two quantities which will be quite useful going forward.

    \displaystyle I_1(z)\equiv \int_{-\infty}^z Q(x,z) dx

    \displaystyle I_2(z)\equiv \int_z^\infty Q(z,y) dy

    In terms of these,

    \displaystyle p(z)= \frac{1}{2}[I_1(z)+I_2(z)]

    There’s nothing special about calling the variables {x} or {y} in the integrals and it is easy to see (since each only covers half the domain) that we get what we would expect:

    \displaystyle p(z)= \frac{1}{2}\int q(w,z)dw

    What we want is the distribution {p(z'|z)= p(z,z'|z)= p(z,z')/p(z)= q(z,z')/p(z)}. This gives us:

    \displaystyle p(z'|z)= \frac{q(z,z')}{\int q(w,z)dw}= \frac{q(z,z')}{I_1(z)+I_2(z)}

    Finally, this gives us the desired quantity {p(z'>z)= \int_{z'>z} dz' p(z'|z)}. It is easy to see that:

    \displaystyle p(z'<z)= \frac{I_1(z)}{I_1(z)+I_2(z)}

    \displaystyle p(z'>z)= \frac{I_2(z)}{I_1(z)+I_2(z)}

    As an example, consider the previous {(x,2x)} case — where one envelope holds twice what the other does. We observe {z}, and {z'} must be either {2z} or {z/2}, though we don’t know with what probabilities. If we are given the underlying distribution on {x}, say {P_2(x)}, we can figure that out. {Q(x,y)= P_2(x)\delta(y-2x)} and {q} is the symmetrized version. {\int q(w,z)dw= \int dw [Q(w,z)+Q(z,w)]= (P_2(z/2)+P_2(2z))}. So {p(z)= \frac{1}{2}(P_2(z/2)+P_2(2z))}. This is just what we’d expect — though we’re really dealing with discrete values and are being sloppy (which ends us up with a ratio of infinities from the {\delta} function when computing probability ratios, but we’ll ignore that here). The relevant probability ratio clearly is {P_2(z/2)/P_2(2z)}. From a purely probability standpoint, we should switch if {P_2(2z)>P_2(z/2)}. If we reimpose the algebraic structure and try to compute expectations (as in the previous problem) we would get an expected value of {z} from keeping and an expected value of {z[P_2(z/2)/2 + 2P(2z)]} from switching . Whether this is less than or greater than {z} depends on the distribution {P_2}.

    Returning to our analysis, let’s see how often we are right about switching if we know the actual distribution {Q} and use that knowledge directly. The strategy is obvious. Using our above formulae, we can compute {p(z'<z)} directly. To optimize our probability of winning, we observe {z} then we switch iff {I_1(z)<I_2(z)}. If there is additional algebraic structure and expectations can be defined, then an analogous calculations give whatever switching criterion maximizes the relevant expectation value.

    In terms of probabilities, full knowledge of {Q} is the best we can do. The probability we act correctly is:

    \displaystyle  \begin{array}{rcl}  P'(win)= \int dz \frac{[\theta(I_1(z)-I_2(z)) I_1(z) + \theta(I_2(z)-I_1(z))I_2(z)]}{I_1(z)+I_2(z)} \\ = \int dz \frac{\max(I_1(z),I_2(z))}{(I_1(z)+I_2(z)} \end{array}

    \displaystyle P'(win|z)= \frac{\max(I_1(z),I_2(z))}{(I_1(z)+I_2(z)}

    Since {I_1} and {I_2} are monotonic (one increasing, the other decreasing), we have a cutoff value {\hat z} (defined by {I_1({\hat z})= I_2({\hat z})}) below which we should switch and above which we should not.

    How do we do with our invented {P} instead? We could recast our earlier formula for {P(win)} into our current notation, but it’s easier to compute directly. For given {z}, the actual probability of needing to switch is {I_2(z)/(I_1(z)+I_2(z))}. Based on our algorithm, we will do so with probability {P(z<d)= 1-F(z)}. The probability of not needing to switch is {I_1(z)} and we do so with probability {P(z>d)= F(z)}. I.e., our probability of success for given {z} is:

    \displaystyle P(win|z)= \frac{I_1(z)F(z) + I_2(z)(1-F(z))}{I_1(z)+I_2(z)}

    For any given {z}, this is of the form {\alpha r + (1-\alpha)(1-r)} where {r= F(z)} and {\alpha= I_1(z)/(I_1(z)+I_2(z))}. The optimal solutions lie at one end or the other. So it obviously is best to have {F(z)=0} when {z<{\hat z}} and {F(z)=1} when {z>{\hat z}}. This would be discontinuous, but we could come up with a smoothed step function (ex. a logistic function) which is differentiable but arbitrarily sharp. The gist is that we want all the probability in {F} concentrated around {\hat z}. Unfortunately, we have no idea where {\hat z} is!

    Out of curiosity, what if we pick instead {P} to be the conditional distribution {p(z'|z)} itself once we’ve observed {z}? We’ll necessarily do worse than by direct comparison using {Q} (the max formula above), but how much worse? Well, {p(z'|z)= q(z,z')/(I_1(z)+I_2(z))}. Integrating over {z'<z} we have {F(z)= \int_{-\infty}^z p(z'|z) dz'= I_1(z)/(I_1(z),I_2(z))}. I.e., We end up with {(I_1^2(z)+I_2^2(z))/(I_1(z)+I_2(z))^2} as our probability of success. If we had used {1-p(z'|z)} for our {P} instead we would get {2I_1(z)I_2(z)/(I_1(z)+I_2(z))^2} instead. Neither is optimal in general.

    Next, let’s look at the problem from an information theory standpoint. As mentioned, there are two sources of entropy: (1) the choice of the underlying pair {(x,y)} (with {x<y} by definition) and (2) the selection {(z,z')=(x,y)} or {(z,z')=(y,x)} determined by our initial choice of an envelope. The latter is a fair coin toss with no information and maximum entropy. The information content of the former depends on the (true) underlying distribrution.

    Suppose we have perfect knowledge of the underlying distribution. Then any given {z} arises with probability {p(z)=\frac{1}{2}[I_1(z)+I_2(z)]}. Given that {z}, we have a Bernoulli random variable {p(z'>z)} given by {I_2(z)/(I_1(z)+I_2(z))}. The entropy of that specific coin toss (i.e. the conditional entropy of the Bernoulli distribution {p(z'> z|z)}) is

    \displaystyle H(z'>z|z)= \frac{-I_1(z)\ln I(z) - I_2(z)\ln I_2(z) + (I_1(z)+I_2(z))\ln [I_1(z)+I_2(z)]}{I_1(z)+I_2(z)}

    With our contrived distribution {P}, we are implicitly are operating as if {p(z'>z)= 1-F(z)}. This yields a conditional entropy:

    \displaystyle H'(z'>z|z)= -(1-F(z))\ln (1-F(z)) - F(z)\ln F(z)

    There is a natural measure of the information cost of assuming an incorrect distribution. It is the Kullback Liebler Divergence (also known as the relative entropy). While it wouldn’t make sense to compute it between {Q} and {P} (which are, among other things, of different dimension, we certainly can compare the cost for given {z} of the difference in our Bernoulli random variables for switching — and then integrate over {z} to get an average cost in bits. Let’s denote by {q(z'>z)} the probability based on the true distribution and keep {p(z'>z)} for the contrived one. I.e. {q(z'>z)= I_2(z)/(I_1(z)+I_2(z))} and {p(z'>z)= 1-F(z)}. For given {z}, the K-L divergence is:

    \displaystyle D(Q || P, z)= \frac{-I_2(z)\ln [(I_1(z)+I_2(z))(1-F(z))/I_2(z)] - I_1(z)\ln [(I_1(z)+I_2(z))F(z)/I_1(z)]}{I_1(z)+I_2(z)}

    Integrating this, we get the mean cost in bits of being wrong.

    \displaystyle  \begin{array}{rcl}  \langle D(Q || P) \rangle= \frac{1}{2}\int dz [-(I_1(z)+I_2(z))\ln [I_1(z)+I_2(z)] - I_2(z)\ln (1-F(z)) \\ -I_1(z)\ln F(z) + I_1(z)\ln I_1(z) + I_2(z)\ln I_2(z)] \end{array}

    The first term is simply {H(z)}, the entropy of our actual distribution over {z}. In fact, the first term and last 2 terms together we recognize as {\langle H(z'>z|z) \rangle}, the mean Bernoulli entropy of the actual distribution. In these terms, we have:

    \displaystyle \langle D(Q || P) \rangle= \langle H(z'>z|z) \rangle + \langle \frac{ -I_2(z)\ln(1-F(z)) - I_1(z)\ln F(z)}{I_1(z)+I_2(z)} \rangle

    where the expectations are over the unconditional actual distribution {p(z)}. The 2nd expectation on the right represents the cost of being wrong about {P}. If it was the optimal distribution with all probability centered near {\hat z} then the term on the right would approach {0} and there would be no entropy cost.

    As an aside, this sort of probabilistic strategy should not be confused with the mixed strategies of game theory. In our case, a mixed strategy would be an apriori choice {aK+(1-a)S} where {K} is the always-keep strategy, {S} is the always-switch strategy, and {0\le a\le 1} is the probability of employing the always-keep strategy. A player would flip a biased-coin with Bernoulli probability {a} and choose one of the two-strategies based on it. That has nothing to do with the measure-theory approach we’re taking here. In particular, a mixes strategy makes no use of the observed value {x} or its relation to the randomly sampled value. Any mixed strategy gives even-odds because the two underlying deterministic strategies both have even-odds.

    Ken Writes a Film

    Scroll down for the link to the movie, and to read my original script.

    A few months ago, I participated in a 72 hour film contest with some friends. It was a lot of fun, and we actually filmed in my condo — which was quite a blast. Aside from ducking out of the way whenever necessary, my role was to write the script.

    The basic premise was that we had to write a horror film in 72 hours with a certain prop, action, and theme. We were given these at 10 PM on the first night, which meant that I had to slam something out relatively quickly. One interesting aspect was that we didn’t actually know who would be available to act, or even how many. So the screenplay had to be easily adaptable. I drafted two ideas by 11:30ish and discussed them with Brian (the director, and a very talented author in his own right). We picked the more promising one, and honed the general idea. About 30 min later, I delivered to Brian the revised script and we decided to go with that.

    Below is a link to the film itself, now publicly available. This definitely was a learning experience, and I have to say the actors (David and Elena) were fantastic to work with. Given that they had so little time (filming had to be finished over a mere 30 hour period, from when they first were handed the script), what they accomplished was incredible. One interesting thing I learned was that phrases which read well on paper are not necessarily ones actors find easy to work with. Unusual turns of phrase are enjoyable in literature, but can be difficult to memorize — especially on short order. I imagine an experienced scriptwriter works closely with actors and has a strong sense of what will be executable and what won’t fly.

    The thing which surprised me most was post-production. We had a very talented post-production crew, but I had no idea what to expect. Again, there is a vast difference between what is plausible on paper (or seems easily filmed) and what is workable in post-production. As you can see, the final cut is quite different from the script.

    This gave me a more forgiving disposition toward Hollywood writers, and a clear understanding that the words (and scenes) set on paper may differ significantly from what audiences ultimately experience. From now on, I’ll be a bit more hesitant to blame screenwriters for the seemingly inane writing which plagues most Hollywood movies. It very well could be due to a confluence of factors which made it difficult or expensive to adhere to the script. Or maybe some idiot executive meddled, or they polled audience sentiment or some such nonsense. We didn’t have any of that, of course — just lots of talented people working performing their roles. So I think such divergences are inevitable. Sadly, no such excuse exists for novel writers.

    I still think having a single screenwriter is the best course, however. Having briefly participated in design by committee (or design by pseudo-autocratic democracy in this case), I think the alternative is far worse. Lots of post-its, a chaos of ideas, and most creativity lost in a homogenization driven by sheer exhaustion and a few strong personalities. Writing is best done by a single writer, with feedback at certain key points from the director. In the 2 hours spent “brainstorming,” a good writer could have pumped out 4 draft ideas, the director could have decided on one or two, and the writer could have finalized them. Too many chefs and all that. Then again, what do I know? If I knew what people actually wanted, I’d be rich.

    Without further ado, here is the final cut. Presumably it’s available somewhere on Amazon Prime but I couldn’t find the link, so I’m including the unofficial one a friend provided.

    Final cut of “A Teachable Moment”

    And here’s my original script (with Brian’s formatting reproduced as best I can given the blog limitations):


    The whole thing is dialog, interspersed with small cuts to other scenes (no voiceovers). The cuts should be smooth and for a few seconds each. No sudden flashy stuff.


    “I’ve been following your work for some time. The unique impact it has.”


    [Smiles ingratiatingly]

    “I like to think so. Do you know what makes teaching so special? It’s a distillation of the noblest human activity: sharing.”



    “Some would take a more cynical view.”


    [quietly regards her for a moment]

    “I’ll be honest. I’ve had lots of advantages.”

    [he laughs light-heartedly]

    “Not everybody has those advantages. Sure, I could feel guilty. But isn’t it better to use my strength for others?

    When you share…”

    [he tenses in poignancy].

    “…you can change a life.”



    “I don’t think anyone would dispute this, but *how* you share matters too. Not everyone is ready to believe in pure motives.”.


    [wry expression]

    “To most people sharing involves a trade: part of themselves for virtue, for the right to imagine themselves a better person. That’s foolish. Sharing is not a transaction. It can ennoble both giver and receiver. A teacher can give without losing.”



    “A lot of people don’t understand what teachers really do. I mean day in and day out, over and over.”


    “I expect it can be quite difficult. Do you ever get tired?”


    [pauses, and gives a cautious laugh]

    “I don’t have that luxury. That would be letting down the world in a sense.”



    “That sounds a bit grandiose.”



    “Yes, I suppose it would to someone not conversant with such matters.”



    “You definitely sound like a teacher.”

    [looks at him slyly]

    “So teach me something.”



    [wags his finger and smiles]

    “I’ll have to charge you. My wisdom doesn’t come free.”


    [grins and suggestively slides her chair right up to him. She’s now close to his face and her body quite close to his]

    “I’ll have to find some way to repay you.”



    [clears his throat, clearly a bit flustered]

    “Very well. I’ll teach you something about teaching. The lessons conveyed through sounds we make are the tiniest fraction of how we teach. It is through subtler manipulations that we imprint our thoughts on the mechanism of this world.”



    [whispering, sultry]:

    “Well, that’s quite a mouthful. I guess I owe you payment.”


    [adjusts collar]:

    “N-no need.”


    “But I insist. I’ll teach you a lesson as well.”

    [she lifts her jacket and flashes a badge.]

    M hesitates and seems like he’s about to lunge at her but she puts her hand to her hip and shakes her head, smiling in satisfaction..

    M slumps back, and W spreads photos of the various cut-scenes.


    “You’re here for me, then?”


    “In a sense.”

    [she smiles and puts her hand on his]

    “I’ve been looking for a good teacher.”

    Do’s and Don’ts for Modern Authors

    Every author has to post about the secrets to authorial success. Well, I’ve got a different take, a special take, a unique take. I HAVE no authorial success. Which means I’m more intimately familiar with what NOT to do. Who wants advice about how to succeed from somebody who HAS succeeded? That’s silly. Obviously they knew somebody, and they’re NOT going to give you that person’s phone number. But I have no such qualms. In fact, here are a few phone numbers which may belong to movers and shakers:

    • 555-1212
    • 000-0000
    • 90210
    • 314159265358979323846
    • 1

    The point is that when none of these are willing to give you the time of day, I will. 7:33 PM.

    So, without further ado, here is a list of helpful do’s and dont’s for aspiring authors:

    • Don’t … use big words or complex sentences. That makes you posh, elite, pretentious, and altogether hateful. Who reads big words and complex sentences these days? That’s old fashioned, like you know like last decade. Who wants to be OLD? Besides, why would you want your book to be inaccessible? Big words and complex sentences mean you will target a tiny number of people who mostly read things they’re told to read by the N.Y. Times and won’t like your stuff anyway unless you know somebody AT the N.Y. Times.

    • Don’t … employ subtle ideas or twists or anything complicated to grasp. Such books are for privileged old people, those educated in the dark era before people realized that the purpose of school was fashionable political activism. Just remember: ideas are bad. Most people don’t have any, and it’s rude to flaunt what you have and others don’t.

    • Don’t … proofread, spell-check, or worry about style or grammar. These are wasteful. Proofreading and editing take time. Lots of time. Nobody appreciates them, and they’ll just slow you down. All the best books were written on a phone using two thumbs and very few brain cells. How many artisanal craftsmen do you know? Exactly. If you’re not producing beer, it’s not a craft — it’s a waste of time. Just write as many words as you can as fast as you can. To borrow from the bible (Bumperstickers 3:21, 4): write them all and let god sort it out.

    • Don’t … use characters, plot, or dialog. Creativity is bad. You’ll only increase the chance of offending people. The best way to avoid doing that is by writing solely about yourself, but only if you’re not the type of person inherently offensive to others. There are some handy websites which list acceptable types of people and unacceptable ones.

    • Don’t … worry about pesky things like factual accuracy or consistency. A famous director said that when it’s a choice between drama and consistency, drama wins every time. He’s an idiot, but a rich one. What do you want to be: right or rich? Incidentally, it’s ALWAYS a choice between drama and consistency. If you have time to be consistent, spend it writing more drama instead. Your time is finite — which is a plothole that conveniently can be plugged by reversing the polarity of the Quantum Tachyonic Blockchain.

    • Don’t … advertise or pay anybody for anything. Why pay for nobody to buy your book, when you can get that for free?

    • Don’t … ask friends or family to review your book. Not because it’s against the rules, but because they won’t. Then you’ll have fewer friends and family. Only ask people you don’t like and who don’t like you.

    • Don’t … issue a press release. Nobody will read it, nobody will care. Yet another book tossed on the dung heap of human blather. Yawn. “News” must be something which matters to other people. Like journalists. As everyone knows, modern journalism involves complaining about something which happened to the reporter’s BFF, making it sound like a ubiquitous problem, and quoting lots of tweets. Serious journalists won’t have time for you because they always have a BFF in trouble, and curating tweets is a fulltime job.

    • Don’t … submit to agents, magazines, or contests. If you were the type of person who could get accepted, you would know because you would be published, famous, or well-connected. Since you’re not published, famous, or well-connected, you obviously won’t be accepted. Sure, every now and then somebody new accidentally slips in. It’s an accident resulting from their being related to somebody published, famous, or well-connected.

    • Do … copy whatever is popular at the moment. Book, movie, video-game, comic, or meme — it doesn’t matter. People only read what’s popular, otherwise something else would be popular. As a rich person once said: if you want to be rich do what rich people do. Which is giving bad advice to poor people. See? I’m going to be rich. Well, he actually never said you would be rich, just want to be. Look, people want to reread the same book over and over. It’s easier because they already know the words and nothing scary and unexpected can happen. So why not rewrite those very words and partake of the riches?

    • Do … focus on fanfiction. Being original is time-consuming, hard, and terribly unprofitable. Who wants to engage in some new unknown adventure when they can dwell in the comfortable world they’ve come to know. Not the real one; that’s terribly uncomfortable. But one inhabited by loveable characters they somehow feel a personal connection to, and who can’t get a restraining order against them.

    • Do … pretend to be somebody else. Nobody likes your sort. Whatever you are is offensive in all ways imagineable. Choose a name which represents the group favored by the publishing industry at this moment. Just look at who gets published and who doesn’t. Not established authors, but debut novelists. Nobody’s going to dump Stephen King just because the name Stephen is anathema according to the politics of that week. But they probably won’t publish debut novelist Stephen Timingsucks (unless TimingSucks is native American and native Americans are in that week).

    • Do … know somebody. It’s the only way to get an agent or publisher. If you don’t know anybody, then the best way to meet them is a cold approach. Go to buildings inhabited by agents and publishers, and ride the elevators. That’s why it’s called an “elevator pitch”. When somebody important-looking gets in, stand next to them, sideways, and stare at the side of their head. Remember: it doesn’t matter how the conversation gets started, just where it goes. Which isn’t always jail. All you need is one yes, and it really doesn’t matter how you get it.

    • Do … make it political. Your book should bravely embrace the prevailing political sentiments of the publishing industry. Only then will you be recognized for the courage of conformity. The publishing industry regularly offers awards for just that sort of thing.

    • Do … write about you, you, and you. Far more appealing to readers than plot, style, or substance is your commonplace personal struggle and how you specifically overcame it. Nothing is as compelling as minor adversity subjectively related by the one who experienced it. Be sure to make clear that the reason you prevailed was your unique grit, determination, and moral superiority. Like the dictators of old, you thrice refused the world’s entreaties to tell your story. Only when sufficiently importuned by the earnest pleas of the masses did you relent and accept the mantle of greatness.

    • Do … blog, tweet, instagram, post, and youtube. Who wants to read a book by and about somebody they don’t feel a personal connection with? Have you ever heard the names Tolstoy, Dickens, or Proust? Of course not. They didn’t understand the importance of selling the author, not the work. You need to sell yourself. Literally. While actual Roman-style slavery is illegal in most States, a variety of financial instruments can achieve the same affect.

    • Do … spend the vast majority of your time inhabiting an ecosystem of writers. Your time is far better spent blogging, connecting, and advising other writers rather than writing for the lay person. Sure, outreach is fashionable these days, and it does have a few benefits. But one should not spend too much time demonstrating the writing process through novels, stories, or poetry. Best to focus on publishing for one’s peers.

    • Do … workshop, workshop, and workshop. No writer of note ever succeeded without writing courses, workshops, several professional editors, and an emotional support network. How else could they learn to express themselves in precisely the right manner as discovered by modern researchers and taught only through MFA programs? This is why there’s nothing worth reading from before the 1990s. Fortunately we live in enlightened and egalitarian times, and the advantages of an MFA are available to everybody. Which explains why everybody has one.

    • Do … be chatty, shmoozy, and a massive extrovert who attends conferences, sucks up to agents, and shamelessly promotes yourself. If you’re not that way, make yourself that way. There are plenty of blogs and books by chatty, shmoozy, massive extroverts on how to. These explain in clear and practical terms how you should have been born chatty, shmoozy, and a massive extrovert. If that doesn’t work, there is a simple surgical procedure which can help. It’s called a lobotomy, and also will help you blog, tweet, post, and youtube more effectively. Be your audience.

    • Do … consider tried and true techniques when ordinary submission and marketing methods don’t work. These business methodologies have been refined and proven in many domains over many years. Whole enterprises are dedicated to their successful application, and they can be surprisingly inexpensive. Extortion, kidnapping, blackmail, torture, and politics all can work wonders for your book’s advancement. Pick your poison. Literally. I have an excellent book coming out, filled with recommendations and in which I describe my own struggle to find the right poison and the absolutely brilliant way I overcame this adversity. It’s a very compelling read.

    • Do … show, don’t tell. When somebody talks about the aforementioned tactics you’ve used, make a gruesome example of them. This is showing, so that people don’t tell. Most writing coaches emphasize the importance of “show don’t tell,” and you can find some excellent examples in the work of various drug cartels and the Heads of State of certain current allies and trading partners.

    • Do … kill your babies. This is another mainstay of writing wisdom, and a constant refrain in almost any workshop. It can be difficult, especially the first few times. But if that initial instinct can be overcome, it definitely is something worth trying. While it won’t always help, such sacrifices have been known to curry favor with XchXlotbltyl, the dark god of publication (and a major shareholder in most large publishing houses). Details on the appropriate ceremonies for different genres can be found on popular writing blogs. And don’t worry, you always can produce more babies… and thus more success.

    • Do … remember there’s no need to write the ending first — or ever. There has yet to be born a human with a different ending. But entropy and the inevitable degradation wrought by time rarely appeal to modern audiences. Best to throw in a sappy romantic hookup or hint at an improbable revival of the seemingly dead protagonist. Which brings us to…

    • Don’t … hint. Nobody likes ambiguity. That is why TV is so popular. Books are a very primitive technology, and they require a lot of unnecessary work by the reader. Faces, scenes, even actions need be imagined anew by every reader. This is inefficient. Remember, you’re catering to people who don’t have cable or can’t afford it or are allergic. It’s your job to make their entertainment as painless as possible despite their unfortunate circumstance. Anything else would be ablist. So don’t leave anything ambiguous. Make sure you spell out what just happened, over and over, just in case the first few explanantions didn’t work. Remember the first rule of teaching: Keep the kids’ Chromebook software up to date. Well, the 2nd rule: repeat everything 3 times for the people with no attention span, too stupid, or too distracted to have caught the first 2 times. And don’t forget to give them an achievement award for getting it. So repeat every plot point 3 times, and congratulate the reader on finally getting it.

    • Do … make the reader feel smarter than fictional characters. This is the point of revealing things to the reader that characters don’t know. A well written book will have the reader shouting advice to the characters. Because if your readers aren’t better than a nonexistent and contrived character, who are they better than?

    • Do … publish each sentence as you write it. In the old days, writers had to wait a long time. Agents vetted writers’ works, publishers vetted agents’ submissions, editors vetted accepted works, and copy-editors, proofreaders, and countless others meticulously checked things at every stage. That book of cat jokes filled replete with typos would take several years to see print, not counting the time required to hand-deliver manuscripts by stagecoach or the frequent loss of an editor or writer from dropsy. Thank goodness we live in modern times! These days there’s no need to wait years for feedback or abide by the traditional publication timeline. Your brilliance need not be thwarted by the need for reflection or editing. Each sentence you write should be tweeted, posted on Wattpad, and blogged the moment it appears. When you get feedback, incorporate it all. Otherwise somebody might be sad, and we don’t want anybody to be sad while reading your book. That’s for somebody else’s book, somebody poor and unsuccessful who uses big words and doesn’t know the rules. Besides, as Hollywood has shown, design by committee is the best way to create a quality creative product. Call it the democratization of writing. As recent polls showed, nothing’s better than democracy. In an ideal world, every word would be voted on and accepted or rejected accordingly. One day this utopia may be real, but for now you’ll have to settle for releasing on a sentence-by-sentence basis. At least you’ll have the satisfaction of knowing that your final product was vetted by countless strangers with wildly varying aptitudes, motives, and tastes, rather than a few so-called “professionals” who’ve been doing the same boring thing for years. Do you really want the same old boring people reading your work, let alone editing it?

    • Do … setup a botnet to counter the million of bad ratings your book will get on social media sites. In uncivilized times, negative reviews only came from critics who actually read your book but didn’t understand it or found it differed in some small way from what they thought you should have written but never would bother to write themselves because they’re too busy writing negative reviews. That was a slow process. We all know how long it took for Mozart to get meaningful feedback like “too many notes,” and how much his craft improved as a result. Imagine what he could have composed if he learned this earlier! These days we’re much more fortunate. One needn’t wait months or years for a hostile stranger with adverse incentives to read your book and pan it. There are millions of hostile strangers with adverse incentives willing to do so without troubling to read it. This is much more efficient, and we have modern social media to thank for rewarding such behavior with improved social standing. Otherwise, you’d have to wait for some “reputable” critic to actually read your literary novel and comment on it. Instead you’ll generously receive feedback from somebody far more credible who only reads young adult coming-of-age novels about pandas but is willing to step out of their comfort zone and negatively rate your book without having read it. You’re welcome.

    • Don’t … have any faith in humanity. If you did you won’t for long. But you didn’t or you wouldn’t be a writer in the first place. Who but from malice would wish to imprint their thoughts on the world. Or ask of another that they occupy the liminal time between nonexistence and nonexistence with a less poetic, less subtle, and less profound rehash of the same tired ideas. You are, after all, asking people to share your delusion of eloquence. That’s almost like founding a cult. Which incidentally, is an excellent way to promote your book.

    • Do .. buy my book. It won’t make you happy, but you can’t buy happiness so you may as well spend your money on this.


    Why NOT to use Amazon Ads for your book

    In today’s article, I ask a simple question: does it pay to advertise on amazon for your book? As can be guessed by the exceedingly astute from the title of the post, the answer is no. In addition to explaining how I came to this conclusion, I also will offer a brief review of the basics of Amazon’s online advertising.

    I’ll examine the matter purely in terms of tangible cost/benefit, and ignore considerations involving ease of use, time wasted dealing with Amazon’s bureaucracy, and the myriad other intangible costs involved.

    First let’s review some of the aspects of Indie publishing which relate to Amazon’s author ad campaigns, as well as how those campaigns work.

    Quick Review of Some Relevant Aspects of Indie Publishing

    Printer vs Distributor

    In general, to sell something on Amazon you need to be designated a “seller” and sign up for a seller account. This can be nontrivial, and at various times Amazon has made it well-nigh impossible to do so. Authors have a special in, however, but only if they publish a version of their book via Amazon. This means producing a Kindle edition and/or (more recently) printing through KDP (formerly Createspace).

    When an author publishes a print edition via some other service, one of two things happen, depending on the type of service. Either that service also offers distribution (Ingram Spark/Lightning Source) or it does not (everybody else). There are two major distributors: Ingram-Spark/Lightning-Source and Baker-Taylor. Of these, only Ingram offers Print-on-demand (POD) services to authors. All other POD services (with the exception of Amazon’s own Createspace, now part of KDP) only offer POD.

    An ordinary POD company sells books to the author/publisher who then may sell them to bookstores, individuals, etc. The author/publisher is responsible for storage, mailing, returns, invoice management, etc. Ingram, on the other hand, has a catalog that is available to all bookstores and automatically is pushed to them regularly. When you POD through Ingram, your book appears in their catalog — and thus quickly is available for order through almost every bookstore. This doesn’t mean it will appear on their shelves, but an individual who wishes to purchase a copy need only walk into a bookstore and ask to order one. In theory, the author/published need never handle a physical copy of the book!

    Why does this matter? It affects how you are treated by Amazon.

    Author vs Seller

    As far as Ingram is concerned, Amazon is just another bookstore. It too automatically slurps in your entry from Ingram’s catalog. Unlike a physical bookstore it offers it for sale just like it does any other book. A “book page” is created for it (and an “author page” may be created for the author), based on a cover image, blurb, etc, obtained from Ingram. Your book will appear and be treated like any other and show as being fulfilled by Amazon itself. Let’s refer to this as “Amazon proper”. Incidentally, Barnes and Noble will do exactly the same thing online.

    Amazon also hosts a seller marketplace (AMS), which includes, among many other vendors, lots of 3rd-party online bookstores. These each slurp in that same info and may offer your book for sale as well, often at a slight discount which they make up for through inflated shipping costs. It’s not uncommon for a new author to see their book appear for sale through myriad sellers immediately after launch and assume those are illicit review copies being resold. They’re not. These just are from mini-bookstore fronts which regurgitate the Ingram catalog. When someone orders from them, the order is relayed to Ingram which then fulfills it. Ditto for an order through Amazon Proper. It’s worth noting that Ingram has special shipping arrangements with Amazon, B&N, etc, and orders from these stores will be prioritized. While it may take 2-4 weeks for an order by the author/publisher themselves to be fulfilled, orders from Amazon or B&N are quickly dispatched.

    The information which appears on the Amazon page for a print book is obtained from the Ingram info. They do allow you to declare yourself an author and “claim” books, setup an author page, etc. Almost all authors put out a Kindle version of their book through KDP. In fact, most only do this. Amazon generally attaches this to any existing print page within a week or two. A few emails may be needed to make sure they associate the same author with both, etc, but generally it’s pretty smooth (as far as Amazon processes go).

    Independent of whether you are an author or publisher, you may setup a store-front on Amazon. Some publishers do this. In this case, you must register as a seller, setup tax info, etc. In theory, you could sell anything, not just your book. The seller can control the descriptions of products they sell, etc. But authors generally need not go to such lengths — as long as they are using KDP for at least one of their version.

    Why all this rigamarole? There is one area where it makes a big difference. Only sellers can run Amazon ad campaigns. If you only have a print edition which has been slurped in, you cannot run an ad campaign. You would have to create a seller store-front, sell the book through that, and then run a campaign as that seller and only for the things sold on that store-front. You couldn’t draw generic traffic to your book on Amazon proper.

    There is a trick, however. As mentioned, authors are viewed as an automatic type of seller — but only if they have a version of their book published through Amazon. If you’ve published a Kindle version of the book, then you qualify. In principle, the ads only would be for that version. But since Amazon links all versions of the book on a single page, de facto it is for all of them. No seller account is needed. This is how most author ad campaigns are run.

    On a practical note, Amazon used to distinguish author ad campaigns from others, offering tools which were more useful. Recently, they lumped them in with all other sellers, making practical management of ad campaigns much more challenging. Most sellers of any size use the API or 3rd party firms to manage their ad campaigns, but as a single author you will be forced to use Amazon’s own Really Awful Web Interface. Hmm… they should trademark that. Because it describes SO many of their web interfaces. But, that’s not what this article is about. Let’s assume it was the easiest to use interface in the world, a pleasure on par with the greatest of epicurean delights. Is it worth doing?

    Before answering that (well, we already answered it in the title, but before explaining that answer), let’s summarize the levels of complexity in managing sales/ads through Amazon:

    1. Easiest. Fulfillment via Amazon and can run ad campaign via Amazon as is:

    • Kindle edition, no POD
    • Kindle edition, POD via Amazon KDP
    • No Kindle edition, POD via Amazon KDP
    • Kindle edition, POD via Ingram

    2. Some effort. Fulfillment via Amazon but need a seller account to run an ad campaign

    • Kindle edition, POD via somebody other than Amazon or Ingram
    • No Kindle edition, POD via Ingram

    3. Messiest. Seller account needed to sell at all

    • No Kindle edition, POD via someone other than Amazon or Ingram

    Types of Ad Campaigns

    Next, let’s review the types of Amazon ad campaigns. There currently are three types. A given author may run many separate ad campaigns for the same book — but each will be of one and only one type.

    1. Sponsored Product Targeting: These ads are in the row of “sponsored products” which appears when you view the relevant product’s Amazon page. In principle you give Amazon a list of specific books, similar in theme or style or subject matter or whose readers are likely to be interested in your own. In practice, you have to be even more specific. You give Amazon a list of “products”, defined by ASINs. There may be many editions or versions of the same book. You’ve got to include ’em all. By hand. Without any helpful “select all” tool. And remember them. Because all you’ll see once your ad campaign is running is a breakdown by ASIN.

    2. Keyword Targeting: These ads appear in searches. There are 3 locations they may be placed: the top 2 spots, the middle 2 spots, or the last 2. Each page of results has ads in one or more of these locations, and they’re designated “sponsored”. Try a few searches, and you’ll see the placement. You give Amazon a list of keywords, generally two or more word phrases, and select how specific a match is required for each (exact, containing it, or broadly related). Then your ad will appear in the results when someone searches for those phrases on amazon. Keyword targeting allows negative keywords as well. For example, it may be a good idea to negate words such as “dvd”, “video”, “audio”, etc, especially if the most popular entries are in those categories. Search for your keyword, see what comes up, and negate any undesirable groups that appear toward the top (using -foo in your search). When you’ve negated the relevant keywords, the top entries should be precisely what you’d like to target.

    3. Category Targeting: You pick the Amazon categories that best suit your book — and presumably the book appears when somebody clicks the category. My experience is that category targeting is well-nigh useless for authors, and generates very few imprints or clicks. So we’ll ignore it.

    Ok, one more piece of review and then we’ll get to the analysis.

    How Amazon Ads Work

    Although their locations and types differ may differ, all ads are placed via the same process: an auction. In fact, pretty much any ad you see anywhere online has been chosen via an almost identical process.

    Every time a web page is served to a user (ex. you browse to a particular product), there are designated slots for ads. This is true of almost any webpage you view anywhere — all that differs is who is selling the ad space. Those slots are termed “impressions” (or more precisely, the placement of an ad in one is called an “impression”). Think of them as very short-lived billboards. To determine which ad is shown, an auction is conducted for each. This all is done very quickly behind the scenes. Well, not *so* quickly. Guess why webpages are so slow to load…


    Because of its ubiquity, the auction process is fairly standard by this point. What I describe here holds for most major sites which sell advertising. The auction used by almost everybody is called a “second price auction”. In such an auction, the highest bid wins but only pays the 2nd highest bid. Mathematically, this can be shown to lead to certain desirable behaviors. Specifically, it is optimal for each participant to simply bid their maximum instead of trying to game things. This is important because Amazon will be given a maximum bid by you, and can only act as your proxy if it has a well-defined strategy for using it. Since it’s also acting as everyone else’s proxy, such a strategy must be a truthful one.

    [As an aside, what I described technically is called a Vickrey auction. Online services use a generalized version of this in which multiple slots are auctioned at once in order of quality. I.e., all the impressions on a page are auctioned simultaneously to the same bidders. The highest bidder gets the best impression, but pays the 2nd highest bid. The 2nd highest bidder gets the 2nd best impression but pays the 3rd highest bid, etc.]

    If you bid 1 and the 2nd highest bid is 0.10, you win and only pay 0.10. So, if you′re a lone risk−taker in a sea of timidity, it pays to bid high. You′ll always win, but you won′t pay much. However if there′s even one other participant with a similar strategy, you may end up paying quite a bit. If both of you bid high, one of you will win, and will pay a lot. For example, if you bid 1 and the other guy bids 0.99, you′ll pay 0.99.

    So far, we’ve discussed the second price auction in the abstract. It’s straightforward enough, even if the optimal strategy may require a little thought. The more interesting issue is what precisely you’re bidding on.

    In an ad auction, you are *not* bidding on the impression per se. Rather, you effectively are bidding on an option on the impression. Let me explain.

    Once every impression on the given web page has been auctioned, the winning ads are displayed. However, the winner of an impression only pays if the user clicks on their ad, regardless of what happens afterwards. To summarize:

    • Win impression, no click: Cost= 0
    • Win impression, click, sale: Cost= 2nd highest bid
    • Win impression, click, no sale: Cost= 2nd highest bid

    Amazon gets paid only if your ad is clicked on. If you win a million impression auctions and nobody clicks on your ad, you pay nothing. If every impression you win gets clicked on but nobody buys anything, you pay for all those impressions. In terms of what you pay Amazon, sales mean nothing, impressions mean nothing, only clicks count. But impressions are what you bid on. Financially, this tracks more closely the behavior of an option than a commodity.


    Obviously, the bid placement process is automated, so you’re not in direct control of the bidding in each auction. In essence, Amazon acts as your proxy in this regard. We’ll get to how your bids are placed shortly, but first let’s review some terminology.

    • Impression: We already encountered this. It is placement of an ad in a particular slot on a particular web page that is served. It is important to note that this refers to placement one time for one user. If the user refreshes the same page or another user visits it, a fresh auction is conducted.
    • Click-through-rate (CTR): The average fraction of impressions that get clicked on. The context determines precisely which CTR we’re talking about.
    • Conversion Rate: A “conversion” is an instance of the end goal being accomplished. In this case, that end goal is a sale (or order). The “Conversion Rate” is the average fraction of clicks that result in sales.
    • Conversions per Impression (CPI): The average fraction of impressions that result in sales. This is just the CTR * Conversion-Rate.
    • Order vs Sale: For most purposes these are the same. For products which may be bought in bulk, the two may differ (ex. 100 boxes of soap could be 1 order but 100 sales). But this rarely applies to books since customers generally buy only one.
    • Cost Per Click (CPC): The average cost of each click. Basically, the average 2nd highest bid in all auctions won by you and for which a click resulted.
    • Average Cost of Sales (ACOS): Each click may cost a different amount, so this measures the average actual cost of each sale, usually stated as a % of sale price. A 200% ACOS for a 10bookmeansthatitcosts20 of advertising on average to make one sale. The ACOS is the CPC/Conversion-Rate.

    Bid Placement

    I mentioned that an auction is conducted for each impression, and that it is done very quickly (in theory). If that’s the case, who are the bidders and how are the bids placed?

    The pool of potential bidders includes every active ad campaign which hasn’t run out of money that day. This pool is narrowed by the specified ad campaign criteria (product targets, keywords, negative keywords, category, etc). The result is a pool of bidders for the specific auction. In our case, these generally would be authors or publishers — but in principle could be anyone.

    Amazon acts as the proxy for all the participants. It determines which ad campaigns should participate in a given auction and it bids based on their instructions. Other than this, it has no discretion.

    As a bidder, you have control of the following (for a given ad campaign):

    • Campaign type: product, keyword, or category.
    • A list of products, keywords, negative keywords, and/or categories as appropriate for the campaign type.
    • For each keyword, product, or category, a maximum bid.
    • A “daily” budget. I’ll explain why this is in quotes shortly.
    • Ad text. You can’t control the image (it’s your book cover), but some text can be provided.

    Putting aside the campaign type and ad text itself, the salient point is that there is a list of “items” (keywords, products, or categories) which each have a maximum bid specified. There also is an overall daily budget.

    It turns out that the “daily” budget isn’t really “daily.” Amazon operates on a monthly cycle, and assigns a monthly budget based on the number of days and the daily budget. On any given day, the daily budget can be exceeeded, though generally not by some huge amount. If Amazon does exceed your monthly budget (which can happen) it will refund the difference. I’ve had this happen. The point is that you’re not really setting a daily budget but a rough guideline. It’s the associated monthly budget which is used.

    Once you exceed the budget constraint, that campaign is inactive until the next day (or month, depending on which budget has been exceeded). Obviously, that makes bidding relatively simple — there is none. So let’s assume the budget hasn’t been breached.

    For each auction, Amazon must determine whether any of the items in the campaign are a match. It then applies the specified maximum bid for that item. In principle. But nothing’s ever that simple, is it?

    Bid Adjustment

    By this point you may have noticed a major problem with the auction system as described. Let’s look at is from a transactional standpoint.

    You earn revenue through sales, but pay for clicks. The resource you have is money (your budget for advertising) and you need to trade it for sales revenue.

    Amazon earns revenue through clicks, but pays in impressions. What do I mean by this? The resource Amazon has is impressions, and they need to trade it for click revenue.

    Any scenario that results in lots of clicks per sale (or more precisely, a high ACOS) is detrimental to you. You wish to minimize ACOS. Otherwise, it will cost a lot of ad-money per sale, and that money presumably would have been better spent on other approaches.

    Similarly, any scenario which results in lots of impressions per click is detrimental to Amazon. If those impressions had been won by more effective sellers, then people would have clicked on them and Amazon would have been paid.

    As an extreme example, suppose Bob’s Offensive Overpriced Craporrium wins every auction on Amazon. Then Amazon will make no money from its ad business. On the other hand, if Sue’s Supertrendy Awesomorrium won, then through hypnosis, telepathy, and blackmail every single user would be compelled to click. This is great for Amazon.

    The problem is that you have control over your ad and, in broad strokes, the types of impressions you bid on. But what control does Amazon have? Other than heavy-handed tactics like throwing Bob off the platform, it would seem to have little means of preventing such losses. Obviously, this isn’t the case. Otherwise, how could Jeff Bezos afford a $35 Billion divorce? Amazon actually has 2 powerful tools. It is important to know about these, since you’ll probably perform like Bob when you first start advertising.

    First, Amazon has an algorithm which selects which impressions are a good match for you. Sometimes they can tune this based on performance. Amazon has no control when it comes to product-targeting. If you said: sign me up for auctions involving ASIN X, Amazon dutifully will do so. However, for other approaches such as keyword or category targeting, they have discretion and can play games. Bob quickly may find that he somehow isn’t a good fit for anything but books on bankruptcy.

    Second, Amazon can reduce your effective bid. In theory, they will bid your stated maximum for the item in question. However, they may throttle this based on performance. Even if your maximum is 3,youmayendupbidding2. It’s unclear whether this affects the amount you (or the other winner) pays upon winning (if a click results), but it probably does. Conducting an auction under other auspices would be very difficult. So, you may end up losing even if the 2nd highest bidder isn’t as high as your maximum.

    Ok, now that we have the background material out of the way, let’s get down to brass tacks. Or iron tacks. Brass is expensive.

    Why it doesn’t pay to advertise

    Now that we’ve reviewed the practice of advertising, let’s look at whether any of this is worth it. Specifically, what would it take to be profitable?

    Let us suppose that our book sells for G,ofwhichwekeepP. For example, a 10bookmayyield2.50 in net revenue for an author (where “net” means net of print costs, Amazon’s cut, etc, not net of advertising costs). In practice, things are a bit more complicated because there may be different P and G’s for the print and Kindle editions. For simplicity, let’s assume a single one for now.

    Before getting to the formal calculation, let’s look at a real example. Here are some numbers from an ad campaign I ran for my first book, “The Man Who Stands in Line.” I didn’t take it too seriously, because the book is not in a genre most people read. But I viewed the process as a good trial run before my novel (now out) “PACE

    Here are the stats. The campaign ran for a little over a year.

    • Impressions: 1,716,984
    • Total sales: $423.99
    • Total ad costs: $931.56
    • CPC: $0.48
    • CTR: 0.11%
    • Total Sales: 77
    • ACOS: 219.7%

    While my book didn’t break any records, it did furnish some useful data. Let’s look at these numbers more closely.

    On its surface, the ACOS doesn’t look too terrible. After all, I paid a little over twice the amount I made — right? Not quite. I paid a little over twice my gross revenue. The problem is that I only care about net revenue.

    As an extreme example, suppose I have two books A and B. Both yield me 2 net revenue per sale as the author, but A costs 1000 and B costs 4. Now suppose I have a pretty darned good ACOS of 50% on 1000 worth of sales. In both scenarios I’ve paid 500 in advertising costs. But in scenario A, I′ve made 2 net revenue. I.e., I have a net loss of 498. In scenario B, I′ve made 500 net revenue and have broken even.

    We immediately see 2 things:

    • The same ACOS can correspond to vastly different net revenues depending on the retail price of the book.
    • It’s really hard to advertise at a profit.

    Returning to my own book, the first problem in analyzing the numbers if that we can’t easily determine net revenue. There were two book formats. The book was available for 9.99 as a paperback (resulting in net revenue of around 2.50) and a 2.99 Kindle edition (net revenue about 2). Fortunately, the two net revenues per book are close. From the total sales, we can guess a net revenue between $150-200. That paints a much more dismal picture than the ACOS implies.

    Let’s next consider a more typical books, and figure out the numbers needed to make advertising profitable. Because the ratio of net to gross revenue per sale will be highest for the Kindle edition, let’s solely focus on that. Any print editions will have even worse ad costs.

    A CPC of 0.50 for books is fairly typical from what I′ve seen. Suppose you have a Kindle book priced at 4.99. With the 70% royalty rate (and no large file fees to speak of), you’d make a little under 3.50 per sale. But let′s be liberal. Let′s say your net profit is 4 per book.

    As mentioned, ACOS is deceptive. If you have an ACOS of 1, it looks like you’re breaking even. You’re not. It means your gross sales are breaking even. Your net revenue is negative. But it’s much much worse than that if you have a print book. Your net profit may be the same across formats but the gross revenue isn’t. The higher the price of the book and the lower the ratio of net to gross revenue per sale, the more unrepresentative the ACOS becomes.

    With the numbers we proposed, we must average 8 or fewer clicks per sale to remain in the black. Otherwise our net revenue for the sale is less than the advertising cost. That is a very optimistic number. Even the most precisely targeted advertising rarely sees such a rate. And that’s just to break even.

    Returning to my own book, what sort of ACOS would be required to break even? With the print edition, we would need a 25% ACOS. With the Kindle edition it would be closed to 66%. In my own case I would have required around a 5x lower ACOS than achieved. But that’s just to break even! Presumably we want to do better. The point of advertising isn’t just to break even. In essence, I would need an unattainable ACOS and Conversion rate for advertising to pay off.

    From these numbers, it’s clear that advertising on Amazon simply can’t pay off for Indie authors. From an economic standpoint it always will operate at a loss.

    But are there any other reasons to advertise?

    I’ve heard claims that the real purpose of such ads is exposure, that one nominal sale translates into many through word of mouth, etc, I’ve seen no evidence of this. It may happen, but the scale is very small.

    Another argument I’ve encountered is that impressions matter. Having lots of impressions may not translate into immediate sales but it raises awareness. The more times people see a book, the more “validated” it becomes in their mind. Presumably this translates into later sales which can’t be tracked as direct clicks. This is good for the author, since it means sales without any associated click-cost. Unfortunately, I’ve seen no evidence of this either. My real sales closely tracked the 77 listed; there weren’t all sorts of separate ones which didn’t appear as clicks or conversions. True, this wasn’t the world’s most marketable book. But if 1.7MM imprints make no difference then it’s too expensive to reach whatever number would.

    My sense is that one or both claims may be true for large, well-known publishers running huge campaigns, and where a friend’s recommendation of a recognizable title tips the scale. But that requires a critical mass and multi-faceted marketing strategy, and way more money than a typical indie author will care to invest.

    Like most services associated with indie publishing — agent readings at conferences, query review, marketing and publicity, books on marketing and publicity, etc — Amazon ads is just another piece of a machine designed to separate the naive from their money using the oldest of human failings: hope.

    So how should you sell books? If I knew, I’d spend my days basking in luxury and fending off rabid fans rather than writing snarky posts which nobody will read. But until that happens, I’ll keep you posted on the things I try. The simple answer may be the one you don’t want to hear: you don’t. You write if you have the inclination and means to do so, but you should have no expectation of being able to sell your book. If you wish to get people to read it, you may do so at a loss via Amazon ads. But there probably are much more effective ways to pay for readers.

    The Art of Writing Circa 2019 in 44 Easy Steps

    1. 1 minute: Come up with interesting observation or creative idea regarding a recent experience.

    2. 10 minutes: Compose concise, eloquent, and impactful written expression of said idea in 6 lines.

    3. 10 minutes: It’s too pompous. Remove 2 lines.

    4. 10 minutes: It’s too vertiginous. Remove 2 lines.

    5. 10 minutes: 2 lines is less pithy than one. Remove 1 line.

    6. 10 minutes: It isn’t accessible to a broad audience. Remove all words over 3 letters, adjectives, adverbs, and any verbs of latinate origin.

    7. 10 minutes: That one semicolon really should be a colon. People don’t like semicolons.

    8. 40 minutes: It could be misinterpreted by the far left, the far right, the Koala anti-defamation league, or Mothers Against Mothers. Reword it.

    9. 1 hour: Properly format the blog post. Italics? No, bold. No, italics. Maybe small-caps? That font really doesn’t look right.

    10. 4.8 hours: Research current trends on google. Add the same 15 long-tail keywords to the title, description, excerpt, post metadata, twitter metadata, facebook metadata, and google+ metadata. Realize google+ doesn’t exist anymore and feel sad, as if you put out an extra place setting for that one late cousin whose name nobody remembers.

    11. 6 hours: Locate a tangentially-related image with a suitable Creative Commons license. Realize the license doesn’t allow the modifications necessary to achieve an NC-17 rating. Find another image, this time with an open license on Wikimedia. Hope that nobody else had the brilliant idea to use a generic image of a college student with the word “Stock” overlaid on it.

    12. 2 hours: Remove face from image to avoid any potential liability.

    13. 2 hours: Thumbnail is different size than image on blog post is different size from instagram version is different size from flickr version. All involve different formats and much much smaller files than you have. Resize, reformat, and wish you weren’t using Windows.

    14. 1 hour: Pick an appropriate excerpt, hashtag, and alt-image text.

    15. 1 hour: Tweet, post, and instagram your idea as text, pseudo-text, image, and sentient pure-energy.

    16. 2 hours: Cross-post to all 14 of your other blogs, web-pages, and social-media accounts.

    16. 20 seconds: Realize that your long-tail keywords no longer are trending.

    17. 20 seconds: Receive 2000 angry tweets. Realize your hashtag already refers to a far-right hate group, a far-left hate group, a Beyonce Sci-Fi fanfiction group, the political campaign of the 237th least popular Democratic candidate for President, the Lower Mystic Valley Haskell, Knitting, and Dorodango group, or all of the above.

    18. 10.8 seconds: Beat Jack Dorsey’s own speed-record for deleting a tweet (which happened to be about Elon Musk tweeting about Donald Trump’s tweets).

    19. 6 hours: Update long-tail keywords to reflect current trends. Realize that Beyonce Sci-Fi fanfiction is trending, and leverage your newfound accidental affiliation to comment on the irony of your newfound accidental affiliation. Then tweet Beyonce to ask if she’ll retweet you.

    29. 5 seconds: Receive automated cease and desist order from Taylor Swift, who loans out her 2000 person legal team to Beyonce on the rare occasions it isn’t in use. Spot idling black limo full of tattooed lawyers outside window. One who looks suspiciously like Jennifer Pariser grins and gently drags her finger across her throat.

    30. 4.2 seconds: Beat own recent world record for deletion of a tweet.

    31. 28.6 minutes: Decide that social media is a waste of time. “Delete” all accounts.

    32. 28.6 minutes: Decide that you need a professional presence on social media after all, and won’t be intimidated by Taylor Swift or her 2000 lawyers. “Undelete” all your accounts.

    33. 1 minute: Decide original post is stupid, obsolete, and has several grammatical errors. Delete it.

    34. 2 hours: Delete all variants of post on blogs, web-pages, twitter, facebook, and instagram.

    35. 4 minutes: Just in case it’s really still brilliant, email idea to a friend.

    36. 4.8 hours; Worry whether [insert appropriate gender normative or non-normative pronoun] likes it.

    37. 1 minute: Try to interpret friend’s ambiguous single-emoticon reply.

    38. 30 minutes: Decide you’re not going to let the establishment dictate what’s art, and that the post’s stupidity, obsolescence, and several grammatical errors are intentional and signs of unappreciated genius.

    39. 12 minutes: Receive voicemail that you missed 2 consecutive shifts at Starbucks and are fired.

    40. 30 minutes: Decide you’re not going to be an indentured servant to the establishment and will go it alone like most great artists throughout history.

    41. 0.8 seconds: Realize you have no marketable skill, don’t know how to market a skill, and don’t even know what markets or skills are. Recall that most great artists throughout history had “Lord” before their name, got money from someone with “Lord” before their name, or died in penury. Consider writing a post about the injustice of this.

    42. 0.2 seconds: Have panic attack that you’ll end up homeless, penniless, and forced to use the public library for internet-access. Google whether euthanasia is legal, and how many Lattes it would take.

    43. 1 minute: Call manager at Starbucks, apologize profusely, and blame Taylor Swift for your absence. Hint that you have an “in” with her, and if the manager takes you back there may be sightings of Taylor Swift’s people idling in a black limo outside.

    44. 6.7 hours: A sadder and a wiser man, you rise the morrow morn. You decide to share your newfound sadness and wisdom with others. Go to step 1.

    Some Pet Peeves of a Grammar Snob

    Language evolves organically, and only a fool would expect the world to remain the same just to accommodate their own inability to move past the life knowledge they happened to acquire during their particular formative years.

    But I’m a fool and proud of it. Or more precisely, I’m selective in my folly. I choose to accept changes which arise organically in a sense which meets my arbitrary standards, but have nothing but disdain for those changes effected through the apparent illiteracy and incompetence of celebrities (also known as “influencers”). To me, it’s like corporate-speak but dumber. And that’s saying a lot.

    Put in simpler and less pompous terms for those of you who don’t understand big words: if some Hollywood moron screwed up and a bunch of jokers adopted the meme, that’s not “organic” growth of language — it’s a Hollywood moron screwing up and a bunch of jokers adopting the meme. None of these people should be allowed near the language, let alone given power to influence it. As far as I’m concerned, there should be a license required. And since you need a language license to take the written test in the first place, nobody could get one. But that’s ok. The language can’t change if nobody uses it.

    So, without further ado (well, there wasn’t really much ado so far, just a lot of whining), here are a few of my favorite things (sung to the dulcet strains of an NWA song):

    1. Same Difference: A difference requires two objects for comparison. To be the same, two differences involve at least 3 objects (and possibly 4) and two comparisons. For example: I’m pedantic and pompous. Same thing (well, not really, but we’ll allow it). I’m pedantic and pompous, and he’s pretentious and self-important. Same difference (well, not really, but a sight better than before). Same thing: 2 items, 1 comparison. Same difference: 3-4 items, 2 comparisons.

    2. Pay the consequences: You pay a penalty or a price. You suffer consequences. I hope that the idiot who birthed this does all three.

    3. Associated to: This one requires a delicate touch. It’s a mistake by my favorite people: mathematicians. And they have oh-so-fragile egos. Sadly, I can’t blame the arch-media-corporate hegemony which secretly controls our brains through alien ultra-quantum-fractal-catchwords. Not that I would anyway. I’m not sure where “associated to” started, but I have an irresistible urge to jump up and scream whenever somebody says it. And since most math articles, books, and even wikipedia articles seem to have adopted it, I basically spend all day standing up and screaming. Which is no different than before, but now I have a plausible explanation when cops, social workers, and concerned-looking parents inquire. I thought of writing an automatic script to change every occurrence in wikipedia, but decided I was too lazy. Besides, every article has a little gatekeeper associated to it who guards it and tends it and flames anybody who tries to change anything. I did read a possible explanation for the phenomenon, however (the “associated to”, not the little folk guarding wikipedia pages). In latinate languages such as Italian, “associare” takes “a” as its preposition, which naively translates to “to” in English. I suspect this is indeed the source, not because I have any knowledge beyond what I read but because of what it would mean if it weren’t true. The only other plausible explanation is that Gonklaxu the Dissatisfier has penetrated the barrier to our galaxy and is sowing discord amongst the mathematicians who pose the greatest threat to his 12-dimensional nonorientable being. Since mathematicians apparently don’t read anything but math books, that strategy would be singularly successful. The thought of Gonklaxu does keep me awake at night, I’ll admit. Because if he is invading, it means he didn’t stop emailing because he was banished to a nonmeasurable corner of the duoverse. Rejection hurts so much. I associate it to the pain of hearing associate to.

    I’m sure I’ll think of a few more soon, so stay tuned!

    Semidirect Products, Split Exact Sequences, and all that

    One of the things I’ve butted heads with in studying Lie Groups is the semidirect product and its relationship to split exact sequences. It quickly became apparent that this was a pretty sizeable hole in my basic knowledge, so I decided to clarify this stuff once and for all.

    — Normal Subgroups and Quotient Groups —

    First, a brief refresher on Normal subgroups and Quotient groups. We are given a group {G} and subgroup {H\subseteq G}.

    • Left cosets are written {gH} and right cosets are written {Hg}. Each is a set of elements in {G}. Not all left cosets are distinct, but any two are either equal or disjoint. Ditto for right cosets.
    • The left (right) cosets form a partition of {G}, but they do not in general form a group. We can try to imbue them with a suitable product, but there are obstructions to the group axioms. For example {g^{-1}H} is not a useful inverse since {(gh)^{-1}= h^{-1}g^{-1}}, so neither left cosets nor right cosets multiply as desired. More generally {(gg')H} does not consist of a product of an element of {gH} and an element of {g'H}.
    • We define the Quotient Set {G/H} to be the set of left cosets. As mentioned, it is not a group in general. There is an equivalent definition for right cosets, written {H\setminus{}G}, but it doesn’t appear often. In most cases we care about the two are the same.
    • It is easy to see that the condition which removes the obstruction is that {gH=Hg} for all {g}. Equivalently, {gHg^{-1}=H} for all {g}. If this holds, the cosets form a group. Often the stated condition is that the sets of left and right cosets are the same. But {g\in gH,Hg} so this is the same exact condition.
    • {H} is a Normal Subgroup if it obeys the conditions which make the cosets into a group.
    • Usually a Normal Subgroup is denoted {N}, and we write {N\triangleleft G} (or {N\trianglelefteq G}).
    • For a Normal subgroup {N}, the Quotient Set {Q=G/N} has (by definition) the natural structure of a group. It is called the Quotient Group.
    • We have two natural maps associated with a Normal Subgroup:
      • {N\xrightarrow{i} G} is an inclusion (i.e. injective), defined by {h\rightarrow h} (where the righthand {h} is viewed in {G}). This is a homomorphism defined for any subgroup, not just normal ones
      • {G\xrightarrow{q} Q} is the quotient map (surjective), defined by {g\rightarrow gN} (with the righthand viewed as a coset, i.e. an element of {G/N}). This map is defined for any subgroup, with {Q} the Quotient Set. For Normal Subgroups, it is a group homomorphism.
    • We know there is a copy of {N} in {G}. Though {Q} is derived from {G} and {N}, and possesses no new info, there may or may not be a copy of it in {G}. Two natural questions are when that is the case, and how {G}, {N}, and {Q} are related in general.

    Let’s also recall the First Isomorphism Theorem for groups. Given any two groups {G} and {H} and a homomorphism {\phi:G\rightarrow H}, the following hold:

    • {\ker \phi} is a Normal Subgroup of {G}
    • {\mathop{\text{im}} \phi} is a subgroup of {H}
    • {\mathop{\text{im}} \phi} is isomorphic to the Quotient Group {G/\ker\phi}.

    Again, we have to ask: since {\ker\phi} is a Normal Subgroup of {G}, and {\mathop{\text{im}}\phi} is isomorphic to the Quotient Group {G/\ker\phi} which “sort of” may have an image in {G}, is it meaningful to write something like (playing fast and loose with notation) {G\stackrel{?}{=} \ker\phi \oplus \mathop{\text{im}} \phi} (being very loose with notation)? The answer is no, it’s more complicated.

    — Exact Sequences —

    Next, a very brief review of exact sequences. We’ll use {1} for the trivial group. The usual convention is to use {1} for general groups and {0} for Abelian groups. An exact sequence is a sequence of homomorphisms between groups {\cdots \rightarrow G_n \xrightarrow{f_n} G_{n-1}\xrightarrow{f_{n-1}} \cdots} where {\mathop{\text{im}} f_n= \ker f_{n-1}} for every pair. Here are some basic properties:

    • {1\rightarrow A \xrightarrow{f} B\cdots} means that {f} is injective.
    • {\cdots A\xrightarrow{f} B\rightarrow 1} means that {f} is surjective.
    • {1\rightarrow A\rightarrow B\rightarrow 1} means {A=B}.
    • Short Exact Sequence (SES): This is defined as an exact sequence of the form: {1\rightarrow A\xrightarrow{f} B\xrightarrow{g} C\rightarrow 1}.
    • For an SES, {f} is injective, {g} is surjective, and {C=B/\mathop{\text{im}} f}
    • SES’s arise all the time when dealing with groups, and the critical question is whether they “split”.

    We’re now ready to define Split SES’s.

    • Right Split SES: There exists a homomorphism {h:C\rightarrow B} such that {g\circ h=Id_C}. Basically, we can move to {B} and back from {C} without losing info — which means {C} is in some sense a subgroup of {B}.
    • Left Split SES: There exists a homomorphism {h:B\rightarrow A} such that {h\circ g=Id_A}. Basically, we can move to {B} and back from {A} without losing info — which means {A} is in some sense a subgroup of {B}.
    • These two conditions are not in general equal, or even equivalently restrictive. The Left Split condition is far more constraining than the Right Split one in general. The direction of the homomorphisms in the SES introduce an asymmetry. [My note: it seems likely that the two are dual in some sense.]

    — External vs Internal View —

    We’re going to described 3 types of group operations: the direct product, semi-direct product, and group extension. Each has a particular relationship to Normality and SES’s. There are two equivalent ways to approach this, depending whether we prefer to define a binary operation between two distinct groups or to consider the relationship amongst subgroups of a given group.

    • External view: We define a binary operation on two distinct, unrelated groups. Two groups go in, and another group comes out.
    • Internal view: We define a relationship between a group and various groups derived from it (ex. Normal or Quotient).
    • These approaches are equivalent. The Internal view describes the relationship amongst the two groups involved in the External view and their issue. Conversely, the derived groups in the Internal view may be recombined via the External view operation.

    We must be a little careful with notation and terminology. When we use the symbol {HK}, it can mean one of two things.

    • Case 1: {H} and {K} are distinct groups. {HK} is just the set of all pairs of elements {(h,k)}. I.e. it is the direct product set (but not group).
    • Case 2: {H} and {K} are subgroups of a common group {G} (or have some natural implicit isomorphisms to such subgroups). In this case, {HK} is the set of all elements in {G} obtained as a product of an element of {H} and an element of {K} under the group multiplication.
    • Note that we may prefer cases where two subgroups cover {G}, but there are plenty of other possibilities. For example, consider {Z_{30}} (the integers mod 30). This has several obvious subgroups ({Z_2}, {Z_3}, {Z_5}, {Z_6}, {Z_{10}}, {Z_{15}}). {Z_2} and {Z_3} only intersect on {0} (the additive identity). However, the two do not cover (or even generate) the group! Similarly, {Z_2} and {Z_{10}} do not cover the group (or even generate it) but intersect on a nontrivial subset!
    • Going the other way, we’ll say that {G=HK} if {H} and {K} are subgroups and every element {g} can be written as {hk} for some {h\in H} and {k\in K}. Note that {H} and {K} need not be disjoint (or even cover {G} set-wise).

    Another potentially confusing point should be touched on. When we speak of “disjoint” subgroups {H} and {K} we mean that {H\cap K=\{e\}}, NOT that it is the null set. I.e., {H\cap K= 1}, the trivial group.

    — Semidirect Product —

    The semidirect product may seem a bit arbitrary at first but, as we will see, it is a natural part of a progression which begins with the Direct Product. Here are the two ways of defining it.

    • External view (aka Outer Semidirect Product): Given two groups {H} and {K} and a map {\phi:K\rightarrow Aut(H)}, we define a new group {H\rtimes K}. We’ll denote by {\phi_k(h)} the effect of the automorphism {\phi(k)} on {h} (and thus an element of {H}). Set-wise, {H\rtimes K} is just {H\times K} (i.e. all pairs {(h,k)}). The identity is {(e,e)}. Multiplication on {H\rtimes K} is defined as {(h,k)(h',k')= (h\phi_k(h'),kk')}. The inverse is {(h,k)^{-1}= (\phi_{k^{-1}}(h^{-1}),k^{-1})}.
    • Internal view (aka Inner Semidirect Product): Given a group {G} and two disjoint subgroups {N} and {K}, such that {G=NK} and {N} is a Normal Subgroup, {G} is called the Semidirect product {N\rtimes K}. The normality of {H} constrains {K} to be isomorphic to the Quotient Group {G/N}.

    There are a few important things to note about this.

    • There are (potentially) many Semidirect products of two given groups, obtained via different choices of {\phi}. The notation is deceptive because it hides our choice of {\phi}. Given any {H,K,\phi} there exists a Semidirect product {H\rtimes K}. The various Semidirect products may be isomorphic to one another, but in general need not be. I.e., a given {H} and {K} may have multiple distinct semidirect products. This actually happens. Wikipedia mentions that there are 4 non-isomorphic semidirect products of {C_8} and {C_2} (the former being the Normal Subgroup in each case). One is a Direct Product, and the other 3 are not.
    • It also is possible for a given group {G} to arise from several distinct Semidirect products (of different pairs of groups). Again from Wikipedia, there is a group of order 24 which can be written as 4 distinct semiproducts of groups.
    • Yet another oddity is that a seemingly nontrivial {H\rtimes K} can be isomorphic to {H\oplus K}.
    • If {\phi= Id} (i.e. every {k} maps to the identify map on {H}), then {G=H\oplus K}.
    • To go from the External view to the Internal one, we note that, by construction, {H} is a Normal Subgroup of {H\rtimes K} and {K} is the Quotient Group {G/H}. To be precise, the Normal Subgroup is {(N,e)}, which is isomorphic to {N}, and the Quotient Group {G/(N,e)} is isomorphic to {K}.
    • To go from the Internal view to the External one, we choose {\phi_k(h)= khk^{-1}} as our function. I.e., {\phi} is just conjugation by the relevant element.
    • It may seem like there is an imbalance here. For a specific choice of Normal Subgroup {N}, the External view offers complete freedom of {\phi}, while the Internal view has a fixed {\phi}. Surely the latter is a special case of the former. The fallacy in this is that we must consider the pair {(G,N)}. We very well could have non-isomorphic {G,G'} with Normal Subgroups {N,N'} where {N\approx N'}. I.e. they are the same Normal Subgroup, but with different parent groups. We then would have different {\phi}‘s via our Internal view procedure. The correspondence is between {(H,K,\phi)} and {(G,N,K)} choices. Put differently, the freedom in {\phi} loosely corresponds to a freedom in {G}.
    • Note that, given {G} and a Normal Subgroup {N} — with the automatic Quotient Group {G/N} — we do NOT necessarily have a Semidirect product relationship. The condition of the Semidirect product is stricter than this. As we will see it requires not just isomorphism, but a specific isomorphism, between {H} and {G/N}. Equivalently, it requires a Right-Split SES (as we will discuss).
    • The multiplication defined in the External view may seem very strange and unintuitive. In essence, here is what’s happening. For a direct product, {H} and {K} are independent of one another. Each half of the pair acts only on its own elements. For a semidirect product, the non-normal half {K} can twist the normal half {H}. Each element of {K} can alter {H} in some prescribed fashion, embodied in {\phi(k)}. So {K} is unaffected by {H} but {H} can be twisted by {K}.
    • It is interesting to compare the basic idea to that of a Fiber bundle. There, the fiber can twist (via a group of homeomorphisms) as we move around the base space. Here, the normal subgroup can twist as we move around the non-normal part. Each generalizes a direct product and measures our need to depart from it.
    • The semidirect product of two groups is Abelian iff it’s just a direct product of abelian groups.

    — Group Extensions —

    As with Semidirect products, there are 2 ways to view these. To make matters confusing, the notation speaks to an Internal view, while the term “extension” speaks to an External view.

    • External view: Given groups {A} and {C}, we say that {B} is an extension of {C} by {A} if there is a SES {1\rightarrow A\rightarrow B\rightarrow C\rightarrow 1}.
    • Internal view: Given a group {G} and Normal Subgroup {N\triangleleft G}, we say that {G} is an extension of {Q} by {N}, where {Q=G/N} is the Quotient Group.
    • Note that the two are equivalent. If {B} is an extension of {A} by {C}, then {A} is Normal in {B} and {C} is isomorphic to the Quotient Group {B/A}.
    • Put simply, the most general form of the Group, Normal Subgroup, induced Quotient Group trio is the Group Extension.

    — Direct Products, Semidirect Products, and Group Extensions —

    In the External view, we’ve mentioned three means of getting a group {B} from two groups {A} and {C}:

    • Direct Product: {B=A\oplus C}. This is unique.
    • Semidirect Product: {B=A\rtimes C}. There may multiple of these, corresponding to different {\phi}‘s.
    • Group Extension: A group {B} for which there are 2 homomorphisms forming a SES {1\rightarrow A\rightarrow B\rightarrow C\rightarrow 1}. There may be many of these, corresponding to different choices of the two homomorphisms.

    Equivalently, we have several ways of describing the relationship between two subgroups {H,K\subseteq G} which are disjoint (i.e. {H\cap K=\{e\}}).

    • Direct Product: {G=H\oplus K} requires that both be Normal Subgroups.
    • Semidirect Product: {G=H\rtimes K} requires that {H} be normal (in which case, {Q=G/H}, and {\phi} is determined by it). For a given {H} there may be multiple, corresponding to different {G}‘s.
    • Group Extension: Both {H} and {K} sit in {G} to some extent. {H} must be Normal.

    Note that not every possible relationship amongst groups is captured by these. For example, we could have two non-normal subgroups or two homomorphisms which don’t form an SES, or no relationship at all.

    An excellent hierarchy of conditions was provided by Arturo Magidin in answer to someone’s question on Stackoverflow. I roughly replicate it here. Unlike him, I’ll be sloppy and not distinguish between subgroups and groups isomorphic to subgroups.

    • Direct Product ({G=H\oplus K}): {H,K} both Normal Subgroups. {H,K} disjoint. {G=HK}
    • Semidirect Products ({G=H\rtimes K}): {H} Normal Subgroup, {K} Subgroup. {H,K} disjoint. {G=HK}. I.e., we lose Normality of {K}.
    • Group Extension ({G} is extension of {H} by {K}): {H} Normal Subgroup, {G/H\approx K}. I.e. {K} remains the Quotient Group (as before), but the Quotient Group may no longer be a subgroup of {G} at all!

    Now is a good time to mention the relationship between the various SES Splitting conditions:

    • For all groups: Left Split is equivalent to {B=A\oplus C}, and they imply Right Split. (LS=DP) => RS always.
    • For abelian groups, the converse holds and Right split implies Left Split and Direct Sum. I.e. the conditions are equivalent. LS=DP=RS for Abelian.
    • For nonabelian groups: Right Split implies {B=A\rtimes C} (with {\phi} depending on the SES map). We’ll discuss this shortly.

    Back to the hierarchy, now from a SES standpoint:

    • Most general case: There is no SES at all. Given groups {A,B,C}, there may be no homomorphisms between them. If there are homomorphisms, there may be none which form an SES. Consider a general pair of homomorphisms {f:A\rightarrow B} and {g:B\rightarrow C}, with no assumptions. We may turn to the first isomorphism theorem for help, but that does us no good. The first isomorphism theorem says that {\ker f \triangleleft B} and {\mathop{\text{im}} f\approx A/\ker f}, and {\ker g \triangleleft C} and {\mathop{\text{im}} g\approx B/\ker g}. This places no constraints on {A} or {C}.
    • Group Extension: Any SES defines a group extension. They are the same thing.
    • Semidirect Product: Any SES which right-splits corresponds to a Semidirect Product (with the right-split map determining {\phi})
    • Direct Product: Any SES which left-splits (and thus right-splits too) corresponds to a direct product.

    So, when we see the standard SES: {1\rightarrow N\rightarrow G\rightarrow G/N\rightarrow 1}, this is a group extension. Only if it right splits can we write {G= N\rtimes G/N}, and only if it left splits can we write {G= N\oplus G/N}.

    — Some Notes —

    • Group Extensions are said to be equivalent if their {B}‘s are isomorphic and there exists an isomorphism between them which makes a diamond diagram commute. It is perfectly possible for the {B}‘s to be isomorphic but for two SES’s not to be equivalent extensions.
    • Subtlety referred to above. A quotient group need not be isomorphic to a subgroup of {G}. It only is defined when {N} is normal, and there automatically is a surjective homomorphism {G\rightarrow Q}. But we don’t have an injective homomorphism {Q\rightarrow G}, which is what would be need for it to be isomorphic to a subgroup of {G}. This is precisely what the right-split furnishes. In that case, it is indeed a subgroup of {G}. The semidirect product may be thought of as the statement that {Q} is a subgroup of {G}.
    • In the definition of right split and left split, the crucial aspect of the “inverse” maps is that they be homomorphisms. A simple injective (for right-split, or surjective for left-split) map is not enough!
    • It is sometimes said that the concept of subgroup is dual to the concept of quotient group. This is intuitive in the following sense. A subgroup can be thought of as an injective homomorphism. By the SES for normal/quotient groups, we can think of a quotient group as a surjective homomorphism. Since injections and surjections are categorically dual, it makes sense to think of quotient groups and subgroups as similarly dual. Whether the more useful duality is subgroup quotient group or normal subgroup quotient group is unclear to me.

    180 Women and Sun Tzu

    It is related that Sun Tzu (the elder) of Ch’i was granted an audience with Ho Lu, the King Of Wu, after writing for him a modest monograph which later came to be known as The Art of War. A mere scholar until then (or as much of a theorist as one could be in those volatile times), Sun Tzu clearly aspired to military command.

    During the interview, Ho Lu asked whether he could put into practice the military principles he expounded — but using women. Sun Tzu agreed to the test, and 180 palace ladies were summoned. These were divided by him into two companies, with one of the King’s favorite concubines given the command of each.

    Sun Tzu ordered his new army to perform a right turn in unison, but was met with a chorus of giggles. He then explained that, “If words of command are not clear and distinct, if orders are not thoroughly understood, then the general is to blame.” He repeated the order, now with a left turn, and the result was the same. He now announced that, “If words of command are not clear and distinct, if orders are not thoroughly understood, then the general is to blame. But if his orders are clear, and the soldiers nevertheless disobey, then it is the fault of their officers,” and promptly ordered the two concubines beheaded.

    At this point, Ho Lu intervened and sent down an order to spare the concubines for he would be bereft by their deaths. Sun Tzu replied that, “Having once received His Majesty’s commission to be the general of his forces, there are certain commands of His Majesty which, acting in that capacity, I am unable to accept.” He went ahead and beheaded the two women, promoting others to fill their commands. Subsequent orders were obeyed instantly and silently by the army of women.

    Ho Lu was despondent and showed no further interest in the proceedings, for which Sun Tzu rebuked him as a man of words and not deeds. Later he was commissioned a real general by Ho Lu, proceeded to embark on a brilliant campaign of conquest, and is described as eventually “sharing in the might of the king.”

    This is a particularly bewildering if unpleasant episode. Putting aside any impression the story may make on modern sensibilities, there are some glaring incongruities. What makes it more indecipherable still is that this is the only reputable tale of Sun Tzu the elder. Apart from this and the words in his book, we know nothing of the man, and therefore cannot place the event in any meaningful context. Let us suppose the specifics of the story are true, and leave speculation on that account to historians. The episode itself raises some very interesting questions about both Sun Tzu and Ho Lu.

    It is clear that Sun Tzu knew he would have to execute the King’s two favorite concubines. The only question is whether he knew this before he set out for the interview or only when he acceded to the King’s request. Though according to the tale it was Ho Lu who proposed a drill with the palace women, Sun Tzu must have understood he would have to kill not just two women but these specific women.

    Let’s address the broader case first. It was not only natural but inevitable that court ladies would respond to such a summons in precisely the manner they did. Even if we ignore the security they certainly felt in their rank and the affections of the King, the culture demanded it. Earnest participation in such a drill would be deemed unladylike. It would be unfair to think the court ladies silly or foolish. It is reasonable to assume that in their own domain of activity they exhibited the same range of competence and expertise as men did in martial affairs. But their lives were governed by ceremony, and many behaviours were proscribed. There could be no doubt they would view the proceedings as a game and nothing more. Even if they wished to, they could not engage in a serious military drill and behave like men without inviting quiet censure. The penetrating Sun Tzu could not have been unaware of this.

    Thus he knew that the commanders would be executed. He may not have entered the King’s presence expecting to kill innocent women, but he clearly was prepared to do so once Ho Lu made his proposal. In fact, Sun Tzu had little choice at that point. Even if the King’s proposal was intended in jest, he still would be judged by the result. Any appearance of frivolity belied the critical proof demanded of him. Sun Tzu’s own fate was in the balance. He would not have been killed, but he likely would have been dismissed, disgraced, and his ambitions irredeemably undermined.

    Though the story makes the proposal sound like the whimsical fancy of a King, it very well could have been a considered attempt to dismiss a noisome applicant. Simply refusing an audience could have been impolitic. The man’s connections or family rank may have demanded suitable consideration, or perhaps the king wished to maintain the appearance of munificence. Either way, it is plausible that he deliberately set Sun Tzu an impossible task to be rid of him without the drawbacks of a refusal. The King may not have known what manner of man he dealt with, simply assuming he would be deterred once he encountered the palace ladies.

    Or he may have intended it as a true test. One of the central themes of Chinese literature is that the monarch’s will is inviolable. Injustice or folly arises not from a failing in the King but from venal advisers who hide the truth and misguide him. A dutiful subject seeks not to censure or overthrow, but rather remove the putrescence which clouds the King’s judgment with falsehood, and install wise and virtuous advisers. Put simply, the nature of royalty is virtuous but it is bound by the veil of mortality, and thus can be deceived. One consequence of this is that disobedience is a sin, even in service of justice. Any command must be obeyed, however impossible. This is no different from Greek mythology and its treatment of the gods. There, the impossible tasks often only could be accomplished with magical assistance. In Sun Tzu’s case, no magic was needed. Only the will to murder two great ladies.

    As for the choice of women to execute, it does not matter whether the King or Sun Tzu chose the disposition of troops and commands. The moment Sun Tzu agreed to the proposal, he knew not only that he would have to execute women but which ones. Since he chose, this decision was made directly. But even if it had been left to the king, there could be no question who would be placed in command and thus executed.

    The palace hierarchy was very strict. While the ladies probably weren’t the violent rivals oft depicted in fiction, proximity to the King — or, more precisely, place in his affections, particularly as secured by production of a potential heir — lent rank. No doubt there also was a system of seniority based on age and family, among the women, many of whom probably were neither concubines nor courtesans, but noble-women whose husbands served the King. It was common for ladies to advance their husbands’ (and their own) fortunes through friendship with the King’s concubines. Whatever the precise composition of the group, a strict pecking order existed. At the top of this order were the King’s favorites. There could be no other choice consistent with domestic accord and the rules of precedence. Those two favorite concubines were the only possible commanders of the companies.

    To make matters worse, those concubines may already have produced heirs. Possibly they were with child at that very moment. This too must have been clear to Sun Tzu. Thus he knew that he must kill the two most beloved of the King’s concubines, among the most respected and noblest ladies in the land, and possibly the mothers of his children. Sun Tzu even knew he may be aborting potential heirs to the throne. All this is clear as day, and it is impossible to imagine that the man who wrote the Art of War would not immediately discern it.

    But there is something even more perplexing in the story. The King did not stop the executions. Though the entire affair took place in his own palace, he did not order his men to intervene, or even belay Sun Tzu’s order. He did not have Sun Tzu arrested, expelled, or executed. Nor did he after the fact. Ho Lu simply lamented his loss, and later hired the man who had effected it.

    There are several explanations that come to mind. The simplest is that he indeed was a man of words and not deeds, cowed by the sheer impetuosity of the man before him. However, subsequent events do not support this. Such a man would not engage in aggressive wars of conquest against his neighbors, nor hire the very general who had humiliated and aggrieved him so. Perhaps he feared that Sun Tzu would serve another, turning that prodigious talent against Wu. It would be an understandable concern for a weak ruler who dreaded meeting such a man on the battlefield. But it also was a concern which easily could have been addressed by executing him on the spot. The temperamental Kings of fable certainly would have. Nor did Ho Lu appear to merely dissemble, only to visit some terrible vengeance on the man at a later date. Sun Tzu eventually became his most trusted adviser, described as nearly coequal in power.

    It is possible that Ho Lu lacked the power oft conflated with regality, and less commonly attendant upon it. The title of King at the time meant something very different from modern popular imaginings. The event in question took place around 500 BC, well before Qin Shi Huang unified China — briefly — with his final conquest of Qi in 221 BC. In Ho Lu’s time, kingdoms were akin to city-states, and the Kings little more than feudal barons. As in most historical treatises, troop numbers were vastly exaggerated, and 100,000 troops probably translated to a real army of mere thousands.

    This said, it seems exceedingly improbable that Ho Lu lacked even the semblance of authority in his own palace. Surely he could execute or countermand Sun Tzu. Nor would there be loss of face in doing so, as the entire exercise could be cast as farcical. Who would object to a King stopping a madman who wanted to murder palace concubines? If Sun Tzu was from a prominent family or widely regarded in his own right (which there is no evidence for), harming him would not have been without consequence. But there is a large difference between executing the man and allowing him to have his way in such a matter. Ho Lu certainly could have dismissed Sun Tzu or proposed a more suitable test using peasants or real soldiers. To imagine that a king would allow his favorite concubines to be executed, contenting himself with a feeble protest, is ludicrous. Nor was Sun Tzu at that point a formidable military figure. A renowned strategist would not have troubled to write an entire treatise just to impress a single potential patron. That is not the action of a man who holds the balance of power.

    The conclusion we must draw is that the “favorite concubines” were quite dispensible, and the King’s protest simply the form demanded by propriety. He hardly could not protest the murder of two palace ladies. Most likely, he used Sun Tzu to rid himself of two problems. At the very least, he showed a marked lack of concern for the well-being of his concubines. We can safely assume that his meat and drink did not lose their savour, as he envisioned in his tepid missive before watching Sun Tzu behead the women.

    While it is quite possible that he believed Sun Tzu was just making a point and would stop short of the actual execution, this too seems unlikely. The man had just refused a direct order from the King, and unless the entire matter was a tremendous miscommunication there could be little doubt he would not be restrained.

    Ho Lu may genuinely have been curious to see the outcome. Even he probably could not command obedience from the palace ladies, and he may have wished to see what Sun Tzu could accomplish. But more than this, the King probably felt Sun Tzu was a valuable potential asset. The matter then takes on a very different aspect.

    From this viewpoint, Ho Lu was not the fool he seemed. The test was proposed not in jest, but in deadly earnest, and things went exactly as he had hoped but not expected. He may have had to play the indolent monarch, taking nothing seriously and bereaved by a horrid jest gone awry. It is likely he was engaging in precisely the sort of deception Sun Tzu advocated in his treatise. He appeared weak and foolish, but knew exactly what he wanted and how to obtain it.

    This probably was not lost on Sun Tzu, either. Despite his parting admonition, he did later agree to serve Ho Lu. It is quite possible that the king understood precisely the position he was placing Sun Tzu in, and anticipated the possible executions. Even so, he may have been uncertain of the man’s practical talent and the extent of his will. There is a great divide between those who write words and those who heed them. Some may bridge it, most do not. Only in the event did Sun Tzu prove himself.

    For this reason, Ho Lu could not be certain of the fate of the women. Nonetheless he placed them in peril. They were disposable, if not to be disposed of. It seems plausible that an apparently frivolous court game actually was a determined contest between two indomitable wills. The only ones who did not grasp this, who could not even recognize the battlefield on which they stepped solely to shed blood, were the concubines.

    By this hypothesis, they were regarded as little more than favorite dogs or horses, or perhaps ones which had grown old and tiresome. A King asks an archer to prove his skill by hitting a “best” hound, then sets the dog after a hare, as he has countless times before. The dog quickens to the chase, eagerly performing as always, confident that its master’s love is timeless and true. Of all present, only the dog does not know it is to be sacrificed, to take an arrow to prove something which may or may not be of use one day to its master. If the arrow falls short, it return to its master’s side none the wiser and not one jot less sure of its place in the world or secure in the love of its master, until another day and another archer. This analogy may seem degrading and insulting to the memory of the two ladies, but that does not mean it is inaccurate. It would be foolhardy not to attribute such views to an ancient King and general simply because we do not share them or are horrified by them or wish they weren’t so. In that time and place, the concubines’ lives were nothing more than parchment. The means by which Ho Lu and Sun Tzu communicated, deadly but pure.

    The view that Ho Lu was neither a fool nor a bon vivant is lent credence by the manner of his rise to power. He usurped the throne from his uncle, employing an assassin to accomplish the task. This and his subsequent campaign of conquest are not the actions of a dissipated monarch. Nor was he absent from the action, wallowing in luxury back home. In fact, Ho Lu died from a battle wound during his attempted conquest of Yue.

    It is of course possible that the true person behind all these moves was Wu Zixu, the King’s main advisor. But by that token, it also is quite possible that the entire exercise was engineered by Wu Zixu — with precisely intended consequences, perhaps ridding himself of two noisome rivals with influence over the King. In that case, the affair would be nothing more than a routine palace assassination.

    Whatever the explanation, we should not regard the deaths of the two concubines as a pointless tragedy. The discipline instilled by two deaths could spare an entire army from annihilation on the field. Sun Tzu posited that discipline was one of the key determinants of victory, and in this he was not mistaken. That is no excuse, but history needs none. It simply is.

    This said, it certainly is tempting to regard the fate of these ladies as an unadorned loss. Who can read this story and feel anything but sadness for the victims? Who can think Sun Tzu anything but a callous murderer, Ho Lu anything but foolish or complicit? It is easy to imagine the two court concubines looking forward to an evening meal, to poetry with their friends, to time with their beloved husband. They had plans and thoughts, certainly dreams, and perhaps children they left behind. One moment they were invited to play an amusing game, the next a sharp metal blade cut away all they were, while the man they imagined loved them sat idly by though it lay well within his power to save them. Who would not feel commingled sorrow and anger at such a thing? But that is not all that happened.

    A great General was discovered that day, one who would take many lives and save many lives. Whether this was for good or ill is pointless to ask and impossible to know. All we can say is that greatness was achieved. 2500 years later and in a distant land we read both his tale and his treatise.

    Perhaps those two died not in service to the ambition of one small general in one small kingdom. Perhaps they died so centuries later Cao Cao would, using the principles in Sun Tzu’s book, create a foundation for the eventual unification of China. Or so that many more centuries later a man named Mao would claim spiritual kinship and murder a hundred million to effect a misguided economic policy. Would fewer or more have died if these two women had lived? Would one have given birth to a world-conquering general, or written a romance for the ages?

    None of these things. They died like everyone else — because they were born. The axe that felled them was wielded by one man, ordered by another, and sanctioned by a third. Another made it, and yet another dug the ore. Are they all to blame? The affair was one random happening in an infinitude of them, neither better nor worse. A rock rolls one way, but we do not condemn. It rolls another, but we do not praise.

    But we do like stories, and this makes a good one.

    [Source: The account itself is taken from The Art of War with Commentary, Canterbury Classics edition, which recounted it from a translation of the original in the Records of the Grand Historian. Any wild speculation, ridiculous hypotheses, or rampant mischaracterizations are my own.]

    How 22% of the Population can Rewrite the Constitution

    This is a scary piece in which I analyze precisely how many voters would be required to trigger a Constitutional Convention and ratify any amendments it proposes. Because the 2/3 and 3/4 requirements in the Constitution refer to the number of States involved, the smaller States have a disproportionate effect. In Congress, the House counterbalances this – but for a Constitutional Convention, there is no such check.

    Read the Paper (PDF)

    A Travel-Time Metric

    Especially in urban areas, two locations may be quite close geographically but difficult to travel between. I wondered whether one could create a map where, instead of physical distances, points are arranged according to some sort of travel-time between them. This would be useful for many purposes.

    Unfortunately, such a mapping is mathematically impossible in general (for topological reasons). But so is a true map of the Earth, hence the need for Mercator or other projections. The first step in constructing a useful visualization is to define an appropriate Travel-Time metric function. Navigation systems frequently compute point-to-point values, but they are not bound by the need to maintain a consistent set of Travel Times between all points. That is our challenge – to construct a Travel-Time metric.

    Read the Paper (PDF)

    Inflation, Up Close and Personal

    It often seems like the inflation figures touted by officials and economists have little connection with the real world. There are a number of reasons for this, some technical and some political. But there is a deeper problem than the means and motives for calculating any specific index. The issue is that any aggregate number is likely to deviate significantly from one’s personal experience. Each of us saves for different reasons and spends in different ways. Without taking these specific choices into account, we cannot accurately represent or protect against the inflation that we individually encounter. This paper elaborates on this idea and explains how each of us can identify the relevant components of inflation, and best hedge our savings.

    Read the Paper (PDF)

    A Proposal for Tax Transparency

    Taxes necessarily are unpopular. They represent an economic burden and do not yield obvious benefits. Though some make a show of embracing their civic duty, few voluntarily would undertake to do so if given a choice. The criminal penalties attached to evasion and the substantial efforts at enforcement are evidence of this. Nonetheless, there is a tie between one’s sense of social responsibility and the palatability of taxes. A perception that our sacrifice benefits ourselves, our loved ones, and society as a whole can mitigate the pain it causes. Conversely, if our hard earned money vanishes into an opaque hole of possible waste and corruption, resentment is engendered.

    The taxes paid by an individual represent a substantial sum to him, but a mere pittance to the government. If there is no accounting for this money, then it appears to have been squandered. This assumption is natural, as the government is known to be a notorious spendthrift. Nor does the publication of a voluminous, incomprehensible, and largely euphemistic budget lend transparency. Even if it were perfectly accurate, and every taxpayer troubled to read it, the human mind isn’t wired to accurately grasp the relationships between large numbers. Thirty thousand dollars in taxes is minuscule compared to a billion or ten billion or a hundred billion, and it makes little difference which of those quantities is involved. Therefore an effort to elicit confidence through a full disclosure of expenditures would be ill fated even if well intentioned. However it would serve to enforce accountability, and should be required in addition to any other measures employed. If nothing else, this would allow watchdog organizations to analyze government behavior and identify waste.

    So how could we restore individual faith in the system of government expenditure? There is in fact a way to do so and encourage fiscal responsibility at the same time. Individuals like to know where their money went. A successful tactic of certain charities is to attach each donation to a specific child or benefit. A person feels more involved, is more likely to contribute, and is better satisfied with their contribution if it makes a tangible difference. We need to know that we aren’t wasting our money.

    The pain of an involuntary contribution may be assuaged through a similar approach. It may even transform into pride. There will be individuals who remain resentful, just as there are those who do not donate to charity. And some people simply don’t like being forced to do anything. However the majority of taxpayers likely will feel better if they know precisely where their money went.

    We propose that an exact disposition of each individual’s taxes be reported to him. At first glance, this may seem infeasible. Funds are drawn from pooled resources rather than attached to such specific revenue streams. However, what we suggest can be accomplished without any change in the way the government does business, and our reporting requirement would not prove onerous. The federal, state, and most local governments already meticulously account for expenses – even if they do not exhibit particular restraint in incurring them. They must do so for a variety of legal and regulatory reasons, and records generally exist even if not publicly available.

    Individual tax contributions need only be linked to expenditures at the time of reporting, but this must be done consistently. To that end, expenses could be randomly matched with the taxes that paid for them. This could be done each February or March for the prior year. We simply require that each dollar of taxes collected be assigned to one and only one dollar spent and vice versa. If there is a surplus, then some taxpayers would receive an assignment of “surplus” and if there is a deficit then certain expenses will be assigned a non-tax source – such as borrowed money or a prior year’s surplus. If a taxpayer’s contribution has been marked as surplus, then his true assignment is deferred until such time as the surplus is spent (again using a lottery system for matching). If it covers a prior year’s deficit then it is matched against that year’s excess expenses. The point is that every dollar of taxpayer money eventually is matched against a real expense.

    For example, one taxpayer’s report could read “10K toward the construction of 121 example plaza, New York,” or better still “3K used for the purchase of air conditioning units, 5K for ductwork, and 2K for electrical routing for work done at XXX and billed to YYY contracting on ZZZ date. Work completed on AAA date.” An individual receiving such a report would feel a sense of participation, accountability, and meaningful sacrifice.

    It may seem that few people would feel pride in defraying the cost of mundane items, but such an objection is misguided. These are real expenses and represent a more comprehensible and personal form of involvement than does a tiny fraction of an abstract budget. If an expense would appear wasteful, pointless, or excessive, then it is appropriate to question it.

    What of the pacifist whose money goes toward weapons or the religious individual whose taxes pay for programs that contravene his beliefs? It may seem unfair to potentially violate a taxpayer’s conscience by assigning him an unpalatable expense. But no exceptions should be made. Their money is being spent in the manner described. Whether their contribution is diluted or dedicated, they live in a society that violates their ideals and they should vote accordingly.

    It is our belief that a feeling of involvement in the operation of government, along with the requisite increase in transparency, would alleviate much of the general perception of irresponsibility, excess, and unaccountability. An individual may object to his relative contribution, but the means of its use would no longer be inscrutable. This could go a long way toward restoring faith in our government.

    Probabilistic Sentencing

    In most real situations, we must make decisions based on partial information. We should neither allow this uncertainty to prevent action or pretend to perfect certainty in taking action. Yet in one area with a great impact on an individual’s freedom and well-being we do just that. Judges and juries are required to return an all-or-nothing verdict of guilt. They may not use their experience, intelligence, and judgment to render a level of confidence rather than a mere binary choice.

    I propose adopting a sentencing mechanism based on a probabilistic assessment of guilt or innocence. This allows jurists to better express their certainty or lack thereof than does our traditional all-or-nothing verdict. The natural place to reflect such an imputed degree of guilt is in the sentencing phase. I discuss the implications of such a system as well as certain issues with implementation.

    Read the Paper (PDF)

    The Requirements of Effective Democracy

    The current popular notion of democracy is something to the effect of “the will of the people is effected through voting.’’ Though this is a far cry from the original meaning of the word or its various incarnations through history, let’s take it as our working definition. It certainly reflects the basic approach taken in the United States. Though often confounded by the public mind with a vague cultural notion of freedom, it only conforms to this when taken together with certain other principles – such as explicit protections of individual liberties.

    This aside, let us consider the components necessary for democracy. To do so, we must make some supposition regarding the ability of an individual voter to render a decision. We assume that every voting individual, regardless of aptitude, is capable of determining their purpose in voting. We say “purpose” rather than “criterion” because we refer to a moral choice, what they hope to achieve by voting. This is far more basic and reliable than any specific set of issues or criteria. A person knows their value system, even if they can not or do not have the means of accurately expressing it. The desires to improve the country, foster religious tenets, create a certain type of society, support the weak, advance one’s own interest, protect a specific right, or promote cultural development cannot easily be manipulated or instilled. While it is possible to create a sense of urgency or attach specific issues or criteria to these values, one’s purpose itself is a reflection of that individual’s view of society and their relationship with it. To meaningfully participate in the democratic process, an individual must translate this purpose into particular votes in particular elections. Note that a purpose may embody a plurality of ideals rather than any specific one (such as in the examples above).

    It is the function of democracy to proportionately reflect in our governance and society the individual purposes of the citizenry. A number of components are involved, any of whose absence undermines its ability to do so. While the consequent process may retain all the trappings of a democracy, it would not truly function as one. Though it could be argued that such imperfection is natural and speaks to the shortcomings of the participants rather than a failing of the institution itself, such a claim is misguided. Regardless of cause, if the people’s will is not accurately reflected then the society does not conform to our popular notion of a democracy. Whether another system would perform better is beyond our present consideration. We simply list certain key requirements for a democracy to function as we believe it should, and allow the reader to decide the extent to which our present society satisfies them.

    Note that a particular government need not directly represents the interest of every citizen, but its formation and maintenance must meaningfully do so. In some loose sense this means that (1) the effect of a citizen is independent of who that citizen is, and (2) the opinion of a majority of citizens is reflected in the actions of the government. These are neither precise requirements nor ones satisfied in practice, particularly in representative democracies. However they reflect our vague cultural concept of democracy.

    The following are the major components necessary for a democracy to function as we believe it should.


    Once a voter has decided upon a set of positions that reflect their purpose, they must have a means of voting accordingly. There must be sufficient choice to allow an individual to embody those positions in their vote. Furthermore, the choice must be real. Marginal candidates with no chance of winning may be useful for registering an opinion, but they do not offer true participation in the election. If there are only two major candidates then the voter’s entire purpose must be reduced to a binary decision. Only if it happens to be reflected in one of the choices at hand would their view be expressible.

    If there are two major candidates and they differ only on a few issues that are of no consequence to a particular individual, then that person cannot express his purpose by voting. For example if a voter feels very strongly about issue X, and both major candidates have the same opposing position on that issue, then he cannot make his will known in that election. It may be argued that the presence of small candidates serves exactly this purpose and that if sentiment is strong enough one could prevail. This is not born out by history. In a two party system, a voter is reduced to a binary choice between two bundled sets of positions. As a more extreme example, suppose there are several major issues and the candidates agree on one of them. Even if every single person in the country holds the opposite position on that issue, their will still cannot be effected through that election. If there were no other important issues, then one or the other candidate surely would take the popular position – or a third party candidate would do so and prevail. However in the presence of other issues, this need not be the case.

    Finally, there must be some reason to believe that the actions of a candidate once elected will reflect their proclaimed positions. Otherwise, it will be years before the voter can penalize them. Without such an assurance – and history certainly does not offer it – a nominal choice may not be a real one. The people then acts the part of a general who cannot move his troops, however much he may threaten or cajole them.


    A well-intentioned individual must have a way of locating and obtaining information whose accuracy is not in question or, if uncertain in nature, is suitably qualified. Voters must have access to accurate and sufficient information. In order to translate their purpose into a vote, an individual must be able to determine the choices available and what they actually entail. Moreover, he must be able to determine the relative importance of different issues in effecting his purpose. Fear mongering, inaccurate statistics, and general misinformation could lead him to believe that a particular issue ‘X’ is of greater import than it truly is. Instead of focusing on other issues ‘Y’ and ‘Z’ which are more germaine to his purpose, he may believe that dealing with issue ‘X’ is the most important step toward it. Similarly, if the views of candidates are obfuscated or misrepresented or the significance of events is disproportionately represented, an accurate translation of his purpose into a vote may be denied a person. Even a perfectly rational and capable voter cannot make a suitable decision in the absence of information or in the presence of inaccurate information. This said, not every vehicle should be expected to provide such information. If a person prefers to listen to a news station that reports with a particular bias, that is not the fault of the information provider – unless it does so subtly and pretends otherwise.


    A voter must have the intelligence, critical reasoning, motivation, and general wherewithal to seek out accurate information, detect propaganda or advertising, and make an informed decision. Their perceived interest must coincide with their true interest, and their purpose be accurately represented in the choice they make. It may seem that we are advocating the disenfrachisement of a segment of the population, individuals who – while failing to meet some high standard – have valid purposes of their own which they too have the right to express. This is not the case, nor is our standard artificial. We are merely identifying a necessary ingredient, not endorsing a particular path of action. Moreover, the argument that they would be deprived of a right is a specious one. Such individuals are disenfranchised, whether or not they physically vote. They lack the ability to accurately express their purpose, and easily are misled, confused, or manipulated. At best they introduce noise, at worst their votes may systematically be exploited. A blind person may have a valid destination, but they cannot drive there.


    Voters must be willing and able to participate. They cannot be blocked by bureaucratic, economic, legal, or practical obstacles – especially in a way that introduces a selection bias. Their votes must be accurately tallied and their decision implemented.


    Not only must the structure of the democratic process treat all voters equally, their de facto influence must be equal. Depending on the nature of the voting system, certain participants may have no real influence even if the system as a whole treats them symmetrically. A simple example would be a nation consisting of four states with blocks of 3, 3, 2, and 1 votes, where each block must vote as a unit. Regardless of the pattern of voting, citizens in the state with a single vote can never affect the outcome. If that vote is flipped, the majority always remains unchanged. This particular topic is addressed in another paper.

    There certainly are many other technical and procedural requirements. However those listed above are critical components that directly determine a voter’s ability to express their will through the democratic process. In their absence, voters could be thwarted, manipulated, misled, or confused. The purpose of democracy isn’t to tally votes, but to register the will of the people. Without the choice and tools to express this will, the people can have nothing meaningful to register.

    A System for Fairness in Sentencing

    We often hear of cases that offend our sense of fairness – excessive sentences, minor crimes that are punished more severely than serious crimes, or two equivalent crimes that are punished very differently. Rather than attempt to solve a politically and legally intractable problem, we ask a more theoretical question: whether an individual can assign sentences in a way that seems reasonable and consistent to him.  Our system is a means of doing so.  We offer a simple algorithmic method that could be used by an individual or review board to ensure that sentences meet a common-sense standard of consistency and proportionality.

    We intend to offer a less mathematical and more legally-oriented version of this article in the near future.

    Read the Paper (PDF)

    Why Voting Twice is a Good Thing

    We should require that every bill be ratified by a second vote, one year after its original passage. It goes into effect as normal, but automatically expires if not ratified at the appropriate time.

    Sometimes foolish legislation is passed in the heat of the moment or due to short term pressures. Perhaps there is an approaching election, or the media has flamed popular hysteria over some issue, or there is a demand for immediate action with no time for proper deliberation, or an important bill is held hostage to factional concerns, or legislators are falling all over one another to respond with a knee jerk reaction to some event. There are many reasons why thoughtful consideration may succumb to the influences of the moment. The consequences of such legislation can be real and long lasting. Law enforcement resources may be diverted or rights suppressed or onerous demands made on businesses. It is true that legislation may be repealed, but this requires an active effort. The same forces that induced the original legislation, though weakened by time, may threaten to damage anyone who takes the initiative to rectify it.

    Here is a simple proposal that could address this problem: Every piece of legislation should be voted on a second time, one year after its original passage. This vote would serve to ratify it. By making this mandatory, the burden of attempted repeal is not placed on any individual. Rather, legislators need simply change their vote. This is less likely to create a fresh political tempest, the issue’s emotional fury long spent. When an act is passed, it goes into effect as normal. However one year from that date, it must be ratified or it will expire. Obviously this should only apply to bills for which such ratification is meaningful; there would be no point in revoting on the prior year’s budget after the money has been spent. By requiring a ratification vote, legislators are given time to breath, sit back, and consider the ramifications of a particular piece of legislation. The intervening year also may provide some flavor of its real effect. A similar approach could be used at all levels of government.

    The Optics of Camera Lens Stacks (Program)

    In another post, I discussed the mathematical calculation of optical parameters for a configuration of stacked lenses and camera components. As is evident from the example worked out there, the procedure is somewhat tedious. Instead, it is better to spend twice the time writing a program to do it. Fortunately I already did this and offer it to you, gentle reader, to use and criticize. I expect no less than one rabid rant about some aspect that doesn’t pedantically conform to the IEEE standard. This is working code (and has been checked over and tested to some extent). I use it. However, it is not commercial grade and was not designed with either efficiency or robustness in mind. It is quick and dirty – but graciously so.

    Think of this as a mine-shaft. You enter at your own risk and by grace of the owner. And if you fall, there won’t be non-stop human interest coverage on 20 TV channels as rescue workers try to extract you. That’s because you’re not a telegenic little kid and this is a metaphor. Rather, you will end up covered in numeric slime of dubious origin. But I still won’t care.

    All this said, I do appreciate constructive criticism and suggestions. Please let me know about any bugs. I don’t plan to extensively maintain this program, but I will issue fixes for significant bugs.

    The program I provide is a command line unix (including MacOS) utility. It should be quite portable, as no funky libraries are involved. The program can analyze a single user-specified configuration or scan over all possible configurations from an inventory file. In the latter case, it may restrict itself to configurations accessible using the included adapters or regardless of adapter. It also may apply a filter to limit the output to “interesting” cases such as very high magnification, very wide angle, or high telephoto.

    The number of configurations can be quite large, particularly when many components are available, there are no constraints, and we account for the large number of focal/zoom choices for each given stack. For this reason, it is best to constrain scans to a few components in an inventory (by commenting out the components you don’t need). For example, if one has both 10 and 25mm extension tubes then try with only one. If this looks promising, restrict yourself to the components involved and uncomment the 25mm as well.

    Either through the summary option or the use of a script to select out desirable configurations, the output may be analyzed and used for practical decisions. For example, if a 10x macro lens is needed and light isn’t an issue then a 1.4X telextender followed by a 200mm zoom followed by a reversed 28mm will do the trick. It will have a high f-stop, but if those components are already owned and we don’t need a low f-stop it may be far more cost-effective option than a dedicated ultra-macro lens (there aren’t any at 10X, but a 5X one is available).

    For simple viewing of the results, I recommend the use of my “tless” utility. This isn’t a shameless plug. I wrote tless for myself, and I use it extensively.

    Go to Google Code Archive for Project

    The Optics of Camera Lens Stacks (Analysis)

    This first appeared on my tech blog. I like to play around with various configurations of camera lenses.  This partly is because I prefer to save money by using existing lenses where possible, and partly because I have a neurological condition (no doubt with some fancy name in the DSM-IV) that compels me to try to figure things out. I spent 5 years at an institute because of this problem and eventually got dumped on the street with nothing but a PhD in my pocket.  So let this be a warning: keep your problem secret and don’t seek help.

    A typical DSLR (or SLR) owner has a variety of lenses.  Stacking these in various ways can achieve interesting effects, simulate expensive lenses (which may internally be similar to such a stack), or obtain very high magnifications.  Using 3 or 4 lenses, a telextender, a closeup lens, and maybe some extension rings (along with whatever inexpensive adapter rings are needed), a wide variety of combinations can be constructed.  In another entry, I’ll offer a companion piece of freeware that enumerates the possible configurations and computes their optical properties.

    In the present piece, I examine the theory behind the determination of those properties for any particular setup.  Given a set of components (possibly reversed) and some readily available information about them and the camera, we deduce appropriate optical matrices, construct an effective matrix for the system, and extract the overall optical properties – such as focal length, nearest object distance, and maximum magnification.  We account for focal play and zoom ranges as needed.

    The exposition is self-contained, although this is not a course on optics and I simply list basic results.  Rather, I focus on the application of matrix optics to real camera lenses.  I also include a detailed example of a calculation.

    As far as I am aware, this is the only treatment of its kind.  Many articles discuss matrix methods or the practical aspects of reversing lenses for macro photography.  However, I have yet to come across a discussion of how to deduce the matrix for a camera lens and vice-versa.

    After reading the piece, you may wonder whether it is worth the effort to perform such a calculation.  Wouldn’t it be easier to simply try the configurations?  To modify the common adage, a month on the computer can often save an hour in the lab.  The short answer is yes and no.  No I’m not an economist, why do you ask?

    If you have a specific configuration in mind, then trying it is easier.  However, if you have a set of components and want to determine which of the hundreds of possible configurations are candidates for a given use (just because the calculation works, doesn’t mean the optical quality is decent), or which additional components one could buy to make best use of each dollar, or which adapter rings are needed, or what end of the focal ranges to use, then the calculation is helpful.  Do I recommend doing it by hand?  No.  I even used a perl script to generate the results for the example.  As mentioned, a freeware program to accomplish this task in a more robust manner will be forthcoming.  Think of the present piece as the technical manual for it.

    Tless Table Viewer

    Over the years, I’ve found delimited text files to be an easy way to store or output small amounts of data. Unlike SQL databases, XML, or a variety of other formats, they are human readable. Many of my applications and scripts generate these text tables, as do countless other applications. Often there is a header row and a couple of columns that would best be kept fixed while scrolling. One way to view such files is to pull them into a spreadsheet, parse them, and then split the screen. This is slow and clumsy, and updates are inconvenient to process. Instead, I wanted an application like the unix utility ‘less’ but with an awareness of table columns. The main requirements were that it be lightweight (i.e. keep minimal content in memory and start quickly), parse a variety of text file formats, provide easy synchronized scrolling of columns and rows, and allow horizontal motion by columns. Strangely, no such utility existed. Even Emacs and vi don’t provide an easy solution. So I wrote my own unix terminal application. I tried to keep the key mappings as true to “less” (and hence vi) as possible. The code is based on ncurses and fairly portable. The project is hosted on Google Code and is open source.

    Go to Google Code Archive for this Project

    Influence in Voting

    Have you ever wondered what really is meant by a “deciding vote” on the Supreme Court or a “swing State” in a presidential election? These terms are bandied about by the media, but their meaning isn’t obvious. After all, every vote is equal, isn’t it? I decided to explore this question back in 2004 during the election year media bombardment. What started as a simple inquiry quickly grew into a substantial project. The result was an article on the subject, which I feel codifies the desired understanding. The paper contains a rigorous mathematical framework for block voting systems (such as the electoral college), a definition of “influence”, and a statistical analysis of the majority of elections through 2004. The work is original, but not necessarily novel. Most if not all has probably been accomplished in the existing literature on voting theory. This said, it may be of interest to a technical individual interested in the subject. It is self-contained, complete, and written from the standpoint of a non-expert in the field. For those who wish to go further, my definition of “influence” is related to the concept of “voting power” in the literature (though I am unaware of any analogue to my statistical definition).

    Ye Olde Physics Papers

    Once upon a time there was a physicist. He was productive and happy and dwelt in a land filled with improbably proportioned and overly cheerful forest creatures. Then a great famine of funding occurred and the dark forces of string theory took power and he was cast forth into the wild as a heretic. There he fought megalomaniacs and bureaucracies and had many grand adventures that appear strangely inconsistent on close inspection. The hero that emerged has the substance of legend.

    But back to me. I experienced a similar situation as a young physicist, but in modern English and without the hero bit.   However, once upon a time I DID write physics papers. This is their story…

    My research was in an area called Renormalization Group theory (for those familiar with the subject, that’s the “momentum-space” RG of Quantum Field Theory, rather than the position-space version commonly employed in Statistical Mechanics – although the two are closely related).

    In simple terms, one could describe the state of modern physics (then and now) as centering around two major theories: the Standard Model of particle physics, which describes the microscopic behavior of the electromagnetic, weak, and strong forces, and General Relativity, which describes the large scale behavior of gravity. These theories explain all applicable evidence to date, and no prediction they make has been excluded by observation (though almost all our effort has focused on a particular class of experiment, so this may not be as impressive as it seems). In this sense, they are complete and correct. However, they are unsatisfactory.  

    Their shortcomings are embodied in two of the major problems of modern physics (then and now): the origin of the Standard Model and a unification of Quantum Field Theory with General Relativity (Quantum Field Theory itself is the unification of Quantum Mechanics with Special Relativity). My focus was on the former problem.  

    The Standard Model is not philosophically satisfying. Besides the Higgs particle, which is a critical component but has yet to be discovered, there is a deeper issue. The Standard Model involves a large number of empirical inputs (about 21, depending on how you count them), such as the masses of leptons and quarks, various coupling constants, and so on. It also involves a specific non-trivial set of gauge groups, and doesn’t really unify the strong force and electro-weak force (which is a proper unification of the electromagnetic and weak forces). Instead, they’re just kind of slapped together. In this sense, it’s too arbitrary. We’d like to derive the entire thing from simple assumptions about the universe and maybe one energy scale.

    There have been various attempts at this. Our approach was to look for a “fixed point”. By studying which theories are consistent as we include higher and higher energies, we hoped to narrow the field from really really big to less really really big – where “less really really big” is 1. My thesis and papers were a first shot at this, using a simple version of Quantum Field Theory called scalar field theory (which coincidentally is useful in it’s own right, as the Higgs particle is a scalar particle). We came up with some interesting results before the aforementioned cataclysms led to my exile into finance.

    Unfortunately, because of the vagaries of copyright law I’m not allowed to include my actual papers. But I can include links. The papers were published in Physical Review D and Physical Review Letters. When you choose to build upon this Earth Shattering work, be sure to cite those. They also appeared on the LANL preprint server, which provides free access to their contents. Finally, my thesis itself is available. Anyone can view it, but only MIT community members can download or print it. Naturally, signed editions are worth well into 12 figures. So print and sign one right away.

    First Paper on LANL (free content)
    Second Paper on LANL (free content)
    Third Paper on LANL (free content)
    First Paper on Spires
    Second Paper on Spires
    Third Paper on Spires
    Link to my Thesis at MIT