The Truth about Stock Prices: 13 Myths

No-fee trading has invited a huge influx of people new to trading. In this article, I discuss the basics of “price formation,” the mechanism by which stock prices are determined.

Like many of us, for much of my life I assumed that a stock had a well-defined “price” at any given point in time. You could buy or sell at that price, and the price would move based on this activity. If it went up you made money, if it went down you lost money. Trading was easy: you just picked the stocks which would go up.

Unfortunately, a youthful indiscretion landed me doing five years at the Massachusetts Institute of Technology. When the doors finally slammed shut behind me, I emerged with little more than a bus ticket and some physics-department issued clothes. Women crossed themselves, and men looked away. Nobody reputable would hire a man with such a checkered past, and the PhD tats didn’t help. So I ended up with the only sort of people interested in such hard cases: Wall Street.

After a couple of years, I caught the eye of a particularly unsavory boss, and he recruited me into a crew doing stat arb at a place called Morgan Stanley. It took me five years to find a way out, and even then the way was fraught with peril. I tried to get out, but they kept pulling me back in. I was in and out of corporations for the next few years, and even did some contract work for a couple of big hedge funds. Only in the turf wars of 2008 did I manage to cut ties for good. The big boys were so busy bankrupting one another, who’d notice one missing guy? The scars are still there, and I always keep an eye on the street. Who knows when a van full of Harvard MBAs will come for me.

On the plus side, I did learn a bit about market microstructure. As it happens, my erstwhile view of prices were naive in many ways.

Rather than a detailed exposition on market microstructure (which varies from exchange to exchange, but has certain basic principles), I will go through a number of possible misconceptions. Hopefully, this will be of some small help to new traders who wish to better understand the dynamics of the stock market. At the very least, it will make you sound smart at cocktail parties.

Because we live in America, and everybody sues everyone about everything, I’ll state the obvious. Before you do anything, make sure you know what you are doing. If you read it here, that doesn’t mean it’s right or current. Yes, I worked in high frequency statistical arbitrage for many years. However, my specific knowledge may be dated. You should confirm anything before relying heavily on it. In particular, I am no tax expert. Be sure to consult an accountant, a lawyer, a doctor, a rabbi, and a plumber before attempting anything significant. And if you do, please send me their info. It’s really hard to find a good accountant, lawyer, doctor, rabbi, or plumber.

Seriously, don’t take anything I say (or anyone else says) as gospel. I’ve tried to be as accurate as possible, but that doesn’t mean there aren’t technical errors. As always, the onus is on you to take care of your own money. As someone pointed out when I started, back when the traders still seemed like superstars: they weren’t paid the big bucks not to make mistakes, but to catch those mistakes before they became problems. My advice, and the best I can give, is that you inform yourself, do research, check, recheck, and recheck again before committing to a trade. In my personal trading I’ve never missed out by being slow and cautious, but I have gotten hammered by being hasty.

Now to the possible misconceptions. I’ll call them “myths” because that’s what popular websites do, so obviously it’s the right thing to do, and I prefer to do the right thing because the wrong thing is wrong.

In what follows “STCG” refers to “Short Term Capital Gain” and “LTCG” refers to “Long Term Capital Gain”. “STCL” and “LTCL” refer to the corresponding losses (i.e. negative gains).

Myth 1: There is a “price” for a stock at any given point in time. When a stock is traded during market hours, there is no single price. There is a bid (the highest priced buy-order) and an ask (the lowest priced sell-order). Often, what people call “the price” is the last trade price. However, sometimes it is the midpoint (bid+ask)/2 or (bid*bidsize+ask*asksize)/(bidsize+asksize), and sometimes more complicated limit-book centroids are used.

Myth 2: I can put a limit order for any price I want. Stocks (and options) trade at defined ticks. A tick is the spacing between allowed prices, and may itself vary with price. For example, the tick-size in stock ZZZ could be 0.01 for prices below 1.00 and 0.05 above that. Often, ticks are things like 1/8 or 1/16 rather than multiples of 0.01. The tick-size rules are per-exchange (or per-security-type on a given exchange) rather than per-stock. In our example, any stock’s price would have allowable values of …, 0.98,0.99, 1.00,1.05, 1.10, …

Myth 3: Limit Orders are better than market orders. Limit orders offer greater control over execution price, but they may not be filled or may result in adverse selection. Suppose ZZZ is trading with a bid of 100 and an ask of 101, with a tick size of 0.50. Alice places a limit order at 100.5 for 100 shares. It is quite possible that it will be filled right away, giving her at least 0.50 of execution improvement (per share) over a market order. But what if it is not filled? If the stock goes up, Alice has incurred what is called “opportunity cost.” Rather than 0.50 in savings she now must pay a higher price or forego the purchase. Why not just leave the limit order out there? Surely it will get filled as the stock bounces around. In fact, why not put a limit order at 98? If it gets executed, it’s free money. The problem is adverse selection. The limit order most likely would get filled when the market was dropping. Sure it could catch a temporary dip, but it also could be caught during a major decline. Statistically, the order will be filled at 98 when Alice does not want it to. She would have been able to buy at 97 or 96 or will be stuck with a falling stock. In the presence of an alpha (a statistical signal informing a predicted return) a noncompetitive bid may at times be appropriate, but In general there is no “free money.”

Myth 4: I can buy or sell any quantity at the relevant price. The bid and ask have sizes associated with them. In fact, the dynamics are more complicated. Each stock has a limit book (or order book), which consists of a sets of buy and sell orders at different prices. Suppose ZZZ has a bid of 100 for 200 shares and an ask of 101 for 50 shares, and a ticksize of 0.50. The spread is two ticks (101−100)/0.50. The quote (bid, ask, bidsize, and asksize) actually is a summary of the inner level of the limit book. The latter consists of a set of levels (maybe 101, 101.5, 102, and 104 on the ask side), each with a queue of orders. The “quote” simply consists of the innermost levels (the highest bid and lowest ask, along with their total sizes). Suppose Bob puts in a market order for 100 shares of the stock. This is matched against the orders at the lowest ask level (101 in this case) in their order of priority (usually the time-order in which they were received). Suppose there only are 50 shares at 101. After fulfilling those orders, we now go to the second level and match the remaining shares at 102, and so on. Each fill is a match against a specific sell-order, and a given trade can result in many fills. For highly liquid stocks, no order you or I are likely to place will match past the inner quote. However, that quote can move quickly and the price at which a market order is executed may not be what you see on the screen. Next, suppose that Bob places a limit order to buy 50 shares at 100.5, right in the middle of the current spread. There now is a new highest bid level: 100.5, and Bob is the sole order in it. Any subsequent market sell order will match against him first. This may happen so fast that the quote never noticeably changes, but if not the new quote bid and bidsize will be 100.5 and 50 shares. If instead, he placed his buy order at 100, he would join the other bids at 100 as the last in the queue at that level. What if he places it at 101? If there were 25 shares available at that ask level, he would match those 25 shares. He now would have a bid for the remaining 25 shares at 101. This would be the new best bid. If he placed the limit order at 110 instead, it effectively would be a market order and would match against the 101 and 102 levels as before. Note that he would not pay 110. The limit book constantly is changing, and to make things worse there often is hidden size. On many exchanges, it’s quite possible for the limit book to show 25 shares available at 101 and yet have Bob filled for 50 at that level. There could be hidden shares which automatically replenish the sell-order but are not visible in the feed. This is intentional, and not a matter of update speed. While it often is possible to subscribe to limit book feeds, most of us only have access to simple data: the current quote (innermost limit-book levels) and the last trade price.

Myth 5: The price at the close of Day 1 is the price at the open of Day 2. This clearly is not true, and often the overnight move is huge and generated by different dynamics than intra-day moves. There are two effects involved. Some exchanges have provision for after-market and pre-open trading (not just order placement), but the main effect is the opening auction. Whenever there is a gap in trading, the new trading session begins with an opening auction. Orders accumulate prior to this, populating the limit book. For example, orders still can be placed outside trading hours However, no crossing (i.e. fills or trades) can occur. This means that the limit book can cross itself, and some bids can be higher than some asks. This never happens during regular trading because of the crossing procedure described earlier. The opening auction is an unambiguous algorithm for matching orders until the book is uncrossed. The closing price on a given day is the last trade price of that day. It often takes a while for data to trickle in, so this gets adjusted a little after the actual close but is fairly stable. The prices one sees at the start of the day involve a flurry of fills from the uncrossing. This creates its own minor chaos, but the majority of the overnight price move is reflected in the orders themselves. If sentiment is negative, there will be significant sell pressure (lots of sell orders and few buy orders), and vice versa if it is positive. There also are certain institutional effects near the open and close because large funds must meet certain portfolio constraints. Note that the opening auction is not restricted to the actual open. Some exchanges (notably the Tokyo Stock Exchange) have a lunch break, and extreme price moves can trigger temporary trading halts. In each case, trading begins with an opening auction.

Myth 6: The price moves because when someone buys people get optimistic and when someone sells people get pessimistic. That certainly can happen. However, there is a more basic reason the price moves. When you buy at the ask, some or all of the sell-orders at that ask-level are filled. There may be hidden size which immediately appears or someone may jump in (or adjust a higher sell-order down), but generally the quote changes. Your trade also is registered as the last trade.

Myth 7: The price behavior of a stock reflects general market sentiment. Though often the case, it need not be. The price we see in most charts and feeds is the last trade price, so we’ll go with that. Consider an unrealistic but illustrative example: ZZZ has a market cap of a billion dollars. Bob and Alice are sitting in their respective homes, trading. [Spoiler: No, they don’t end up together after a series of outlandish rom-com mishaps.] The rest of the market, including most of the major institutions which own stock in ZZZ, are sitting back waiting for some news or simply have no desire to trade ZZZ. They don’t participate in trading and have no orders out. So it’s just Alice and Bob. ZZZ has a last trade price of 100. Bob has a limit order to buy 1 share at 100 and Alice has a limit order to sell 1 share at 101. This is the quote, and the entirety of the limit book. Bob gets enthusiastic, and crosses the spread. The price now is 101. Since he sees the price going up, Bob decides to buy more. Alice still has shares she wants to unload, and puts in a sell limit order for 1 share at 102. Bob bites. The price is now 102. The pattern repeats with Alice always offering 1 share at p+1, with p the last price, and Bob always buying after a minute. They do this 50 times, and the closing price is 150. Two people traded a total of 50 shares, so has the price of a billion dollar company really risen 50%? Admittedly, this is a ridiculous example. In reality, the quote would be heavily populated even if there was little active trading, and everybody else wouldn’t sit idly by while these two knuckleheads (well, one knucklehead, since Alice does pretty well) go at it. However, similar phenomena do arise. Lots of small traders can push the price of a stock way up, while larger traders don’t participate. In penny stocks, this sort of thing actually can happen (though not in such an extreme way). When a stock is run up, it is important to look at the trading volume and (if possible) who is trading. Institutional traders aren’t necessarily skilled or wise, and can get caught up in a frenzy or react to it — so these sorts of effects can have real market impact if they persist. However, they often are temporary and do not reflect true market sentiment.

Myth 8: Shorting is just like buying negative shares, and the only difference is the sign. In many cases, it effectively behaves like this for the trader. However, the actual process is more complicated. “Naked shorts” generally are not allowed, though they can arise in anomalous circumstances. When you sell short, you are not simply assigned a negative number of shares and your PnL computed accordingly. You borrow specific shares of stock from a specific person. The matching process is called a “locate”, and is conducted at the broker-level if possible or at the exchange-level if the broker has no available candidates. There is an exception for market-makers and for brokers when a stock is deemed “easy to borrow”, meaning it is highly liquid and there will be no problem covering the short if necessary. Brokers maintain dynamic “easy to borrow” and “hard to borrow” stock lists for this purpose. There are situations in which a short may not behave as expected. Suppose Bob sells short 100 shares of ZZZ stock, and the broker locates it with Alice. Alice owns 100 shares, and the locate assigns these to Bob. If Alice decides to sell them at some point, Bob needs to be assigned new shares [note that this has no effect on the person Bob sold short to, just Bob]. If these cannot be located, he must exit his position. The short sale is contingent on the continuing existence of located shares. Moreover, if the market goes up a lot Bob may have to put up additional capital for the cost of covering at the higher price. In principle, a short can result in an unlimited loss. In practice, Bob would be closed out by margin call before then.

Myth 9: Shares are fungible. When you sell them, it doesn’t matter which ones you sell. They are fungible from the standpoint of stock trading (aside from the short-selling locates just discussed), but not from a tax standpoint. Most brokers allow you to choose which specific shares you are selling. Suppose Bob bought 100 shares of ZZZ at 50 three years ago and bought another 100 shares of ZZZ at 75 six months ago. ZZZ now is 100 and he decides to sell 100 shares. Selling the first 100 shares generates a LTCG of 5,000, whereas selling the second 100 shares generates a STCG of 2,500. The tax implications can be significant, and are discussed further below. The specifics of Bob’s situation will determine which sale is more advantageous (or less disadvantageous). Brokers generally default to FIFO accounting, meaning that the first shares bought are the first shares sold. Most brokers allow alternatives to be specified at the time of the trade, however. These may include LIFO (last shares bought are first shares sold) or direct specification of the shares themselves. Note that such accounting only applies within a given brokerage account, whereas the tax consequences are determined across all brokerage accounts.

Myth 10: A “no-fee” trading account is better than one with fees. The transaction cost of a trade involves several components. The main three are broker fees, exchange fees, and execution. “No-fee” means they dispense with the broker fee. Unless many small trades are being executed with high frequency, the broker fee tends to be small. The exchange fees are passed along to the trader, even for “no-fee” accounts, but are smaller than typical broker fees. Often, the quality of execution comprises the bulk of the transaction cost. Serious trading shops use transaction cost models and order working strategies to optimize execution. As a small trader relying on a retail broker, we don’t have the speed or positioning to be able to do this. No or low-fee brokers often cross flow internally or sell flow to high-frequency firms which effectively front-run the trader. Market orders see slightly worse execution than they could, and limit orders get filled with slightly lower frequency than they could (or are deferred to face an indirect cost via adverse selection). These effects are not huge, but something to be aware of. Suppose Alice buys 100 shares of ZZZ at 100. Broker X is no−fee, and Broker Y charges a fee of 7.95 per trade but has 10 bp (0.1%) better execution than Broker X on average. That 10 bp is a price improvement of 0.10 per share, and amounts to 10 for the trade. Alice would do better with Broker Y than Broker X. This benefit may seem to apply only to large trades, but it also applies to stocks with large spreads. For illiquid stocks (including penny stocks) the price improvement can be much more significant. There are trading styles (ex. lots of small trades in highly liquid stocks) where no-fee trumps better execution, but often it does not.

Myth 11: Taxes just nuisances, and the price is what really matters. Taxes can eat a lot of your profit, and should be a primary consideration. Tax planning involves choosing accounts to trade in (401K or other tax-deferred vs regular), realizing losses to offset gains, and choosing assets with low turnover. Note that some mutual funds can generate weird capital gains through their internal trading. In extreme cases, someone could pay significant tax on a losing position in one. Why are taxes so important to trading? The main reason is that there can be a 25% (or more) difference in tax rate between a LTCG and a STCG. STCGs often are taxed punitively, and at best treated like ordinary income. Here in MA, the state tax alone is 12% for STCGs vs 5% for LTCGs. Federally, STCGs are treated as ordinary income while long term gains have their own rate. STCGs are defined (currently) as positions held for under one year, while LTCGs are held for over one year. Note that it is the individual positions that matter. If Bob owns 200 shares of ZZZ, bought in two batches then each batch has its own cost basis (price he paid for it) and purchase date. Also note that most options positions are expire in less than a year and would result in a STCG or STCL. A STCG can only be offset by a STCL, but a LTCG can be offset by a LTCL or STCL. Needless to say, STCLs are valuable (unpleasant since they’re losses, but valuable from a tax standpoint). They can be rolled to subsequent years under some circumstances, but may be automatically wasted against LTCGs if you are not careful. A good understanding of these details can save a lot of money. To illustrate the impact, suppose Alice has a (state+federal) 20% LTCG rate (marginal) and a 45% STCG rate (marginal). She makes 10,000 on a trade, not offset by any loss. If it is a LTCG, she pays 2,000 in taxes and keeps 8,000. If it is a STCG, she pays 4,500 and keeps 5,500. That′s an additional 2,500 out of her pocket. Since the markets pay us to take risk, she must take more risk or tie up more capital to make the same 8,000 of after-tax profit. How much more risk or capital? Not just the missing 25%, because the extra profit will be taxed at the 45% rate as well. We solve 0.55*x= 0.8*10,000, to get 14,545. Alice must take (loosely speaking) 45% more risk or tie up 45% more capital to take home the same amount. Note that the appearance of 45% both here and as the tax rate is coincidental.

Myth 12: Options act like leveraged stock. This is untrue for many reasons, but I’ll point out one specific issue. Options can be thought of as volatility bets. Yes, the Black Scholes formula depends on the stock price in a nonlinear manner, and yes the Black Scholes formula significantly underestimates tail risk. But for many purposes, it pays to think of options as predominantly volatility-based. Let’s return to our absurd but illustrative scenario involving Alice and Bob and their ridiculous behavior. As before, they trade ZZZ stock and are the only market participants. Alice sells to Bob until the price reaches 110, then decides she misses her 10 shares of stock. Bob too has an epiphany. He decides he hates ZZZ stock. They now switch roles and Bob sells her a share at 110, but he gets her strategy backward so the price goes down with each share rather than up. He sells her a share at 110, then 109, then 108, down to 101. Now he’s out of shares and they have both another revelation. They return to their original roles, and up the price goes. The day’s trading involves ZZZ stock price see-sawing between 101 and 110 in this fashion. Neither makes a net profit, and the price ends where it started (well, 101 vs 100 but that’s not important here). Consider somebody trading the options market (we said Alice and Bob were the only stock traders, but there could be a thriving options market). At the start of the day and the end of the day, the price is pretty much at the same level. However, the price of both call and put options has risen dramatically. Options prices are driven by several things: the stock price, the strike price, the time to expiry, and the volatility. If the stock price rises dramatically, put options will go down but not as much as the price change would seem to warrant. This is because volatility has increased. In our see-saw case, everything was constant (approximately) except the volatility. The stock price is unchanged, but the option prices have changed dramatically.

Myth 13: There are 13 myths. If you spend your time puzzling over this rather than trading, you will end up with the same amount of money (on average and minus transaction costs). Which leaves the market to me, so I can run the stock up to infinity, then sell to the one unwary buyer who gets greedy and dips his toe in at the wrong time. All your base are belong to me.

My new monograph is out!

My new math monograph now is available on Amazon.com and Amazon.co.uk, and soon will be available on other venues via Ingram as well.

Amazon US Paperback
Amazon UK/Europe Paperback

The monograph is an attempt to mathematically codify a notion of “moral systems,” and define a sensible measure of distances between them. It delves into a number of related topics, and proposes mathematical proxies for otherwise vague concepts such as hypocrisy, judgment, world-view, and moral trajectory. In addition to detailed derivation of a number of candidate metrics, it offers several examples, including a concrete distance calculation for a simple system. The framework developed is not confined to the analysis of moral systems, and may find use in a wide variety of applications involving decision systems, black box computation, or conditional probability distributions.

Why Your Book Won’t Be an Amazon Success Story

I’m going to be that guy. The one nobody likes at parties. The one who speaks unpleasant truths. If you don’t want to hear unpleasant truths, stop reading.

If you want to be told which self-help books to buy and which things to do and which gurus will illuminate the shining path to fame and fortune, stop reading.

If you want somebody to hold your hand, and nod at all the right moments and ooh and aah about how your writing has come a long way and you’re “almost there,” stop reading.

It doesn’t matter whether you’ve come a long way. It doesn’t matter whether your writing is almost there, is there, or is beyond there. It doesn’t matter what you’re saying or how you’re saying it. You may have written the most poignant 80,000 words in the English language, or you may have another book of cat photos. None of that matters.

Unless you’re a certain type of person saying a certain type of thing in a certain way, none of it matters. And that certain type of person, that certain type of thing, and that certain way changes all the time. Today it’s one thing, tomorrow it will be another.

Statistically speaking, you’re not it.

“But what about all those success stories,” you argue. “I’m always hearing about Amazon success stories. Success, success, success! This book mentioned them and that blog mentioned them and the 12th cousin of my aunt’s best friend’s roommate had one.”

There are two reasons this doesn’t matter.

Most of those stories are part of a very large industry of selling hope to suckers. Any endeavor which appeals to the masses and appears to be accessible to them spawns such an industry. Business, stock picking, sex, dating, how to get a job, how to get into college, and on and on. Thanks to today’s low barrier to entry, self-publishing is the newest kid on that block.

This isn’t a conspiracy, or some evil corporation with a beak-nosed pin-striped CEO, cackling ominously while rubbing his hands. Self-publishing just attracts a lot of people who see an easy way to make money. When there’s a naive, eager audience, a host of opportunists and charlatans purvey snake oil to any sucker willing to pay. They’re predators, plain and simple. Hopefully, I can dissuade you from being prey. Leave that to others. Others unenlightened by my blog. Cynicism may not always be right, but it’s rarely wrong.

Even seemingly reputable characters have become untrustworthy. The traditional publishing industry has grown very narrow and institutional, and life is hard for everyone associated with it. The temptation to go for the easy money, and cast scruples to the winds, is quite strong. Not that denizens of the publishing industry ever were big on scruples. Many individuals from traditionally respectable roles as agents, editors, and publishers find it increasingly difficult to eke out a living or are growing disillusioned with a rapidly deteriorating industry. It is unsurprising that they are bedazzled by the allure of easy money. Unsurprising, and disappointing. This is especially insidious when agents offer paid services which purport to help improve your chances with other agents. The argument is that they know what their kind wants. Anybody see the problem with this? Anybody, anybody, Bueller? It would be like H.R. employees taking money to teach you how to get a job with them. Oh wait, they do. How could THAT possibly go wrong…

I’m not going to delve into the “selling hope to suckers” angle here. That is fodder for a separate post, in which I analyze a number of things which did or did not work for me. For now, I’ll focus on the second reason your book won’t be an Amazon Success Story. Incidentally, I will resist the temptation to assign an acronym to Amazon Success Story. There! I successfully resisted it.

In this post, I’ll assume that ALL those stories you hear are right. Not that they’re 99% bunk or that most actual successes had some outside catalyst you’re unaware of or were the result of survivorship bias (the old coin-flipping problem to those familiar with Malkiel’s book). To paraphrase the timeless wisdom of Goodfellas, if you have to wait in line like everyone else you’re a schnook. If you’re trying what everyone else tries, making the rounds of getting suckered for a little bit here a little bit there, with nothing to show for it — you’re the schnook.

Don’t feel bad, though. No matter how savvy we are in our own neighborhoods, we’re all schnooks outside it. Hopefully, I can help you avoid paying too much to learn how not to be a schnook.

I can’t show you how to be successful, but I can show you to avoid paying to be unsuccessful. But that’s for another post. We’re not going to deal with the outright lies and deception and rubbish here. Those are obvious pitfalls, if enticing. Like pizza.

In this post, we’re going to assume the success stories are real — as some of them surely are. We’re going to deal with something more subtle than false hope. We’re going to discuss the OTHER reason you won’t be successful on Amazon. It’s not obvious, and it can’t be avoided.

But first, I’m going to make a plea: if you’re the author of one of those breathless, caffeinated “how to be a bzillionaire author like me” books or blogs or podcasts … stop it. Please. Just stop it. Unless you’re cynically selling hope to suckers or mass-producing content-free posts as click-bait. In that case, carry on. I don’t approve of what you do, but I’m not going to waste breathe convincing dirtbags not to be dirtbags. However, if you’re even the least bit well-meaning, stop. Maybe you have some highly popular old posts along these lines. Update them. Maybe you’re writing a new series of posts based on what your friend named John Grisham has to say to self-publishing authors. Don’t.

You’re doing everyone a disservice. People will waste money and time and hope. Best to tell them the truth. You may not be that guy. You may be too nice, tactful, maybe even (dare I say) an optimist. I’m not an optimist. I AM that guy. No false hope sold here.

Maybe you’re still reading this and haven’t sky-dived into a volcano or fatally overdosed on Ben & Jerries, or turned to one of those cheerful, caffeinated blogs. Shame on you. There’s special internet groups for people like you. But you’re still here, and I haven’t driven you away. I must be doing something wrong.

If you’re a true dyed in the wool masochist, I’ll now explain why you won’t be successful. It has to do with a tectonic shift in Amazon’s policies.

Over a year ago, I wrote a post titled “Why NOT to use Amazon Ads for your book,” which many people have written me about. Most found it a useful take on Amazon ads, and one of the few articles which doesn’t regurgitate lobotomized praise for the practice.

I stand by that. Subsequent experiments (to be reported in a future post) have shown that Amazon ads perform even worse now. This led me to wonder why. Why did all the long-tailed keywords and the reviews and the ads make no difference. None of us know the precise inner workings of Amazon ads, but there are strong indications of their behavior.

I now will offer my theory for why there are success stories, why it’s tempting to believe they can be emulated, and why they cannot. To do so, let’s review some basic aspects of Amazon’s algorithms.

There are two algorithms we care about:

(1) The promotion algorithm, which ranks your book. It is responsible for placing it in any top 100 lists, determining its visibility in “customers also bought” entries, when and how it appears in searches, and pretty much any other place where organic (i.e. non-paid) placement is involved.

(2) The ad auction algorithm, which determines whether you win a bid for a given ad placement.

The promotion algorithm determines how much free promotion your book gets, and is critical to success. It has only a couple of basic pieces of information to work with: sales and ratings. The algorithm clearly reflects the timing of sales, and is heavily weighted toward the most recent week. It may reflect the source of those sales — to the extent Amazon can track it — but I have seen no evidence of this. As for ratings, all indications are that the number of ratings or reviews weighs far more heavily than the ratings themselves. This is true for consumers too, as long as the average rating is 3+. Below that, bad ratings can hurt. Buyers don’t care what your exact rating is, as long as there isn’t a big red flag. The number of ratings is seen as a sign of legitimacy, that your book isn’t some piece of schlock that only your grandmother and dad would review — but your mom was too ashamed to attach her name to. Anything from a traditional publisher has 100’s to 1000’s of ratings. A self-published work generally benefits from 15+. More is better.

It makes sense that the promotion algorithm can play a role, but why mention an “ad auction algorithm”. Ad placement should depend on your bid, right? Maybe you can tweak the multipliers and bids for different placements or keywords, but the knobs are yours and yours alone. You might very well think that, but I couldn’t possibly comment. Unlike the ever-diplomatic Mr. Urquhart, I’m too guileless to take this tack. I also don’t use Grey Poupon. I can and will comment. You’re wrong. Amazon’s ad algorithm does a lot more behind the scenes. You may be the highest bidder and still lose, and you may be the lowest bidder and win.

As usual, we must look at incentives to understand why things don’t behave as expected. Amazon does not run ads as a non-profit, nor does it get paid a subscription fee to do so. It only makes money from an ad when that ad is clicked, and it only makes money from a sale when the ad results in a conversion. For sellers, the latter is a commission and for authors it’s the 65% or 30% (depending on whether you chose the 35% or 70% royalty rate) adjusted for costs, etc. In either case, they make money from each sale and they make money from each click.

Amazon loses money if your ad wins lots of impressions, but nobody clicks on it. They would have been happier with a lower bid that actually resulted in clicks. If lots of people click on your ad, but few people buy your book, Amazon would have been happier with a lower bid which resulted in more sales. It’s a trade-off, but there are simple ways of computing these things. When you start fresh, Amazon has no history (though perhaps if you have other books, it uses their performance). It assigns you a set of default parameters representing the average performance of books in that genre. As impressions, clicks, and sales accrue, Amazon adjusts your parameters. This could be done through a simple Bayesian update or periodic regressions or some other method.

When a set of authors bids on an ad, Amazon can compute the expected value of each bid. This looks something like P(click|impression)*ebid + P(sale|click)P(click|impression)*pnl, where P(click|impression) is your predicted click-through-rate for that placement, P(sale|click) is your predicted conversion rate for that placement, ebid is the effective bid (I’ll discuss this momentarily), and pnl is the net income Amazon would make from a sale of your book. This is an oversimplification, but gets the basic idea across.

The ebid quantity is your effective bid, what you actually pay if you win the auction. There actually are two effective bids involved. Amazon’s ad auctions are “second-price,” meaning the winning bidder pays only the 2nd highest bid. Suppose there are 5 bids: 1,2,3,4,5. The bidder who bid 5 wins, but only pays 4. There are game theoretic reasons for preferring this type of auction, as it encourages certain desirable behaviors in bidders. In this case, the effective bid (and what Amazon gets paid) is 4. That is no mystery, and is clearly advertised in their auction rules. What isn’t advertised is the other, hidden effective bid. These effective bids may be 3,2,4,2,3, in which case the third bidder wins. What do they actually pay? I’m not sure, but something less than their actual bid of 3.

Apparently, whatever algorithm Amazon uses guarantees that a bidder never will pay more than their actual bid. It somehow combines the two types of effective bids to ensure this. I am not privy to the precise algorithm (and it constantly changes), so I cannot confirm this. However, I have been informed by an individual with intimate knowledge of the subject that Amazon’s approach provably guarantees no bidder will pay more than their actual bid.

Why would Amazon prefer a lower bid, when they could get 4? As mentioned, they only get paid 4 if the ad of the winning bidder (the 5) gets a click. If the ad makes every reader barf or have a seizure or become a politician, there won’t be a lot of clicks. If it’s the most beautiful ad in human history, but the book’s landing page makes potential buyers weep and tear their hair and gnash their teeth, it probably won’t make many sales. In either case, Amazon would do better with another bidder.

Even without knowing the precise formula, one thing is clear. These algorithms are a big problem for anyone who isn’t already a star.

The problem is that those two algorithms play into one another, generating a feedback loop. If you’re already successful, everything works in your favor. But if you start out unattractive to them, you remain that way. You have few quality ad placements, and get few sales, and this suppresses your organic rank. The organic rank factors into many things which affect P(click|impression) and P(sale|click) — such as the number of reviews, etc. Put simply, once they decide you’re a failure, you become a failure, and remain one. You won’t win quality bids, even if you bid high. If you bid high enough to override the suppression, then you’ll pay an exorbitant fee per click, and it will cost a huge amount to reach the point where success compounds.

I am unsure whether there is cross-pollination between works by a given author, but I strongly suspect so. A new work by a top-ranked author probably starts high and is buoyed by this success. This may be why we see a dozen works by the same author (obviously self-published, and sometimes with very few ratings per book) in the top-100 in a genre.

So how do you get out of this hole? There’s only one accessible way for most people: you cheat. And this is where Amazon’s tectonic policy shift comes into play.

There ARE success stories, like the aforementioned top-ranked self-published authors. But there won’t be any more. To understand why, we must turn to hallowed antiquity before Bezos was revealed to be the latest incarnation of Bchkthmorist the Destroyer, and when Amazon brought to mind a place with trees, snakes, and Sean Connery.

There was a time when the nascent self-publishing industry had really begun to boom, but was poorly regulated. The traditional publishers viewed Amazon, Kindle, and self-publishing as a joke. They relied on their incestuous old-boys network of reviewers from the NY Times, New York Review of Books, and pretty much anything else with New York in the name for promotion. 95% of self-published books were about how to self-publish, and authors who DID self-publish (and were savvy) quickly developed ways to game Amazon.

They COULD pump up their search results, get in top-100 lists, and so on. Usually, this involved getting lots of fake reviews and using keyword tricks to optimize search placement. Once in the top list for a genre, it was easy to stay there — though newcomers with more fake reviews and better keyword antics could displace you. The very top was an unstable equilibrium, but the top 500 or 1000 was not. Once up there, it was easy to keep in that range and then occasionally pop into the very top. Like a cauldron of mediocrity, circulating its vile content into view every now and then. Amazon periodically tweaked its algorithms, but authors kept up.

Then something happened. Amazon decided to crack down on fake reviews. This sounds laudable enough. Fake reviews have the word fake in them, and fake always is bad, right?

There were two problems with HOW Amazon went about it. First, they went way overboard. Overnight, it became well-nigh impossible for an author to get a single new review. If the reviewer had one letter in common with your name, lived in the same hemisphere, or also breathed air, they were deemed connected to you and thus biased.

If this had been applied uniformly, there would be nobody in the top 100 — or it would be random, since nobody would have any tricks they could play. This is where the second problem with Amazon’s approach came in. They didn’t remove legacy fake ratings. Those who cheated before the cutoff got to keep their position. In fact, that position now was secure against all newcomers. A gate had slammed down, and they were firmly on the right side of it. Aside from a few people near the boundary they had nothing to fear. Well, almost nothing to fear.

The only way to break into the top echelon, and thus benefit from the self-reinforcing algorithms which stabilize that position, is to rely on external sources of sales. If you have a million twitter followers who buy your book, or a massive non-amazon advertising campaign, you can break in. They YOU would be very difficult to displace.

Once traditional publishers realized that Amazon is the only de facto bookstore left (outside airport/supermarket sales), they took an interest. THEY have no problem getting a top rank, because they run huge advertising campaigns and have huge existing networks. This is why the top 100 lists are an odd mixture of self-published books you never heard of and traditionally published bestsellers. Eventually it only will be the latter.

So. You. Won’t. Break. In. Amazon created an impenetrable aristocracy, and you’re not it. You won’t be it. You can’t be it. If you use Amazon ads or buy into any of the snake oil sales nonsense, you’ll be the schnook bribing a maitre d’ who knows he’ll never let you in.

Most of those success stories (or at least the real ones) are from before the policy change, as are many of the methods being touted. That path is gone. Amazon ads only work for those who don’t need them, and they work very well for them. They won’t work for you. Becoming a success on Amazon is as unlikely as with a traditional publisher. You’ll always hear stories, but they’re either the few who randomly made it, those with hidden external mechanisms of promotion, or those already entrenched at the top.

That’s the sad truth, or at least my take on it. By all means, waste a few dollars trying. I used to be a statistical trader and know better, but I still buy a lottery ticket when the jackpot’s high enough. It’s entertainment. Two dollars to dream for a day. I just don’t expect to win.

Write what you want, revise, work your butt off, and make it perfect. But do it because you want to, because that’s what makes you happy. Don’t do it expecting success, or hoping for success, or even entertaining the remote possibility of success.

The worst reason to write is for other people. Your work won’t be read, and your work won’t make you money. If you accept that and are happy to write anyway, then write all you want. I urge you to do so. It’s what I do.

PACE Sample Chapter

 The following is a sample chapter from my book PACE.

Captain Alex Konarski gazed through the porthole window at the blue mass below. It looked the same as it had for the last nine years. When first informed of the Front, he had half-expected to see a pestilential wall of grey or a glowing force field or some other tell-tale sign. Instead there was nothing, just the same globe that always was there. The same boring old globe.

Konarski remembered the precise time it had taken for her charms to expire. Six months, twelve days. It was the same for every newcomer to the ISS; at first, they gawked at the beauty of Earth and couldn’t shut up about it. Then they did. Konarski always waited a discrete period after each arrival before asking how long it had taken.

Nobody seemed to remember the point at which things changed, they just woke up one day and the magic was gone. How like marriage, he’d laugh, slapping them on the back. By now the joke was well-worn. Of course, it wasn’t just the Earth itself. When somebody new arrived, they acted like a hyperactive puppy, bouncing with delight at each new experience, or perhaps ricocheting was a better choice of word up here.

Once the excitement died down, they discovered it was a job like any other, except that home was a tiny bunk a few feet from where you worked. The tourists had it right: get in and out before the novelty wore off. The ISS basically was a submarine posting with a better view and better toilets.

Earth became something to occasionally note out the corner of one’s eye. Yep, still there. Being so high up almost bred contempt for the tiny ball and its billions of people. This had been less of a problem in the old days, when the ISS sounded like the inside of a factory. But since the upgrade, things were so quiet that one could not help but feel aloof. Aloof was invented for this place. As a general rule, it was hard to hold in high regard any place toward which you flushed your excrement. Well, not quite *toward*.

There was a fun problem in orbital mechanics that Konarski used to stump newbies with. Of course, Alex had learned it in high school, but his colleagues — particularly the Americans — seemed to have spent their formative years doing anything but studying. For some reason, America believed it was better to send jocks into orbit than scientists. Worse even, it made a distinction between the two. Nerds are nerds and jocks are jocks and never the twain shall meet. It was a view that Konarski and most of the older generation of Eastern Europeans found bewildering. But that was the way it was.

So, Alex and his friends gave the newbies the infamous “orbit” problem. If you are working outside the ISS and fling a wrench toward Earth what will happen? Invariably, the response was to the effect that “well, duh, it will fall to Earth”. With carefully practiced condescension, Alex then would inform them that this is not correct. The wrench will rebound and hit the pitcher. It was one of the many vagaries of orbital dynamics, unintuitive but fairly obvious on close reflection.

The victim would argue, debate, complain, declare it an impossibility. Alex patiently would explain the mathematics. It was no mistake. Only after the victim had labored for days over a calculation that any kid should be able to do would they — sometimes — get the answer.

For some reason the first question they asked after accepting the result always was, “How do you flush the toilets?”

“Very carefully,” Alex would answer.

Then everybody had a drink and a good laugh. Yes, shit would fall to earth just as it always had and always would.

The spectrometer indicated that there was some sort of smog developing over Rome. Alex wondered if this would be a repeat of Paris. There had been sporadic fires for weeks after the Front hit that city. Some were attributable to the usual suspects: car crashes as people fled or died, overloads and short-circuits, the chaos of large numbers of people fleeing, probably even arson, not to mention the ordinary incidence of fires in a major city, now with nobody to nip them in the bud. Mostly, though, it just was the unattended failure of humanity’s mechanized residue.

The Front couldn’t eradicate every trace of our existence, but perhaps it would smile gleefully as our detritus burned itself out. Those last embers likely would outlast us, a brief epitaph. Of course, the smaller fires weren’t visible from the station, and Alex only could surmise their existence from the occasional flare up.

The same had occurred everywhere else the Front passed. In most cases there had been a small glow for a day or so and then just the quenching smoke from a spent fire. On the other hand, there was a thick haze over parts of Germany since fires had spread through the coal mines. These probably would burn for years to come, occasionally erupting from the ground without warning. There was no need to speculate on *that*; Konarski’s own grandfather had perished this way many years ago. The mines had been killing people long before there was any Front. But the occasional fireworks aside, cities inside the Zone were cold and dead.

The ISS orbited the Earth approximately once every ninety minutes. This meant that close observation of any given area was limited to a few minutes, after which they must wait until the next pass. During the time between passes, the Front would expand a little over a quarter mile. Nothing remarkable had happened during the hundred passes it took for the Front to traverse Paris. And it wasn’t for another twenty or so that the trouble started.

*Trouble?* Something about the word struck him as callous. It seemed irreverent to call a fire “trouble”, while ignoring the millions of deaths which surely preceded it. Well, the “event”, then. Once it started, the event was evident within a few passes. Alex had noticed something wrong fairly quickly. Instead of a series of small and short-lived flare ups, the blaze simply had grown and grown.

At first he suspected the meltdown of some unadvertised nuclear reactor. But there was no indication of enhanced radiation levels. Of course, it was hard to tell for sure through the smoke plume. By that point it looked like there was a small hurricane over Paris, a hurricane that occasionally flashed red. It really was quite beautiful from his vantage point, but he shuddered to think what it would be like within that mile-high vortex of flame.

It had not ceased for seven days. Some meteorologist explained the effect early on. It was called a firestorm, when countless small fires merge into a monster that generates its own weather, commands its own destiny. It was a good thing there was nobody left for it to kill, though Alex was unsure what effect the fountain of ash would have on the rest of Europe.

In theory there probably were operational video feeds on the ground, but the Central European power grid had failed two months earlier. It had shown surprisingly little resilience, and shrouded most of Europe in darkness. Of course, the relevant machinery lay within the Zone and repairs were impossible.

Konarski wondered how many millions had died prematurely because some engineering firm cut corners years ago. It probably was Ukrainian, that firm. Alex never trusted the Ukrainians. Whatever the cause, the result was that there was no power. And by the time Paris was hit any battery-driven units were long dead. Other than some satellites and the occasional drone, he and his crew were the only ones to see what was happening.

The Paris conflagration eventually had withered and died out, of course. What was of interest now was Rome. The ISS had been asked to keep an eye on the regions within the Zone, gleaning valuable information to help others prepare or, if one were fool enough to hope, understand and dispel the Front altogether. However, the real action always surrounded the Front itself. Especially when it hit a densely-developed area, even if now deserted. But it wasn’t just orders or morbid curiosity that compelled Alex to watch. Where evident, the destruction could be aesthetically beautiful.

Safely beyond the reach of the Front, Alex could watch the end of a world. How many people would have the opportunity to do so? There was a certain pride in knowing he would be among the last, perhaps even *the* last. Once everyone had perished, the crew of the ISS would be alone for a while, left to contemplate the silence. Then their supplies would run out, and they too would die.

Based on the current consumption rate of his six person crew, Alex estimated they could survive for another six years — two years past the Front’s anticipated circumvallation of Earth. Of course, he doubted the process would be an orderly one. Four of the crew members (himself included) came from military backgrounds, one was a woman, and three different countries were represented. Even at the best of times, there was a simmering competitiveness.

Konarski assumed that he would be the first casualty. No other scenario made sense, other than something random in the heat of passion — and such things didn’t require the Front. No, barring any insanity, he would go first. He was the leader and also happened to be bedding the only woman. Who else would somebody bother killing? Of course, with *this* woman, he shuddered to think what would happen to the murderer. Of course, *she* was the one most likely to kill him in the first place.

Obviously, they hadn’t screened for mental health in the Chinese space program. In fact, he guessed that any screening they *did* do was just lip-service to be allowed to join the ISS. But Ying was stunning and endlessly hilarious to talk to, and Alex had nothing to lose.

If the Front hadn’t come along, he would have faced compulsory retirement the following year. Then he would have had the privilege of returning to good old Poland, a living anachronism in a country that shunned any sign of its past. Alex gave it about a year before the bottle would have taken him. Who the fuck wanted to grow old in today’s world? The Front was the best thing that ever happened, as far as he was concerned. It made him special.

Alex would try to protect Ying for as long as he could, but he knew how things would unfold. Perhaps it would be best to kill her first, before anyone got to him. Or maybe he just should suicide the whole crew. It would be the easiest thing in the world, all he really had to do was stop trying to keep everyone alive. Or he actively could space the place and kill everyone at once, a grand ceremonial gesture. But that would be boring.

Besides, part of him wanted to see who *would* be the last man standing. The whole of humanity in one man. The one to turn out the lights, not first but final hand. Humanity would end the way it began, with one man killing another. After all, everybody always was talking about returning to your roots. Alex just was sad they no longer had a gun on board. That *really* would have made things interesting.

These were distant considerations, however; worth planning for, but hardly imminent. At the moment the world remained very much alive, and was counting on them for critical information. Alex wondered if it would be better to be the last man alive or the man who saved the world.

“The savior, you dumb fuck,” part of him screamed. “Nobody will be around to care if you’re the last one alive.” Of course, Poland already was gone. There was no home for him, even the one he wouldn’t have wanted. Maybe he was the last Pole. But how would he change a light bulb?

For some reason, a series of bad Pollack jokes popped into Konarski’s head. There was a time when he would have taken great offense at such jokes, jumped to his country’s defense, maybe even thrown a few obligatory punches. But not now, not after what Poland had become over the last decade, and especially not after how they had behaved toward the end. They could go fuck themselves. And now they had. Or somebody bigger and badder had fucked them, just like had happened through most of their history.

Still, he felt a certain pride. Maybe he would be the start of a new, prouder race of Poles. No, that was just the sort of talk that had made him sick of his country, the reason he was commanding ISS under a Russian flag. Besides, there probably still were plenty of Poles around the world. He wasn’t alone. Yet.

If Alex watched Rome’s demise closely, he couldn’t be accused of exultation or cruel delight. He had watched his home city of Warsaw perish just three days earlier. Of course, it was nearly empty by the time the Front reached it. But he had listened to the broadcasts, the chatter, and he was ashamed of the conduct of his countrymen. They had acted just like the self-absorbed Western pigs he detested.

Ying understood. She was Chinese. When *they* left their old and infirm behind it would be from calculated expedience, not blind selfish panic. The decision would be institutional, not individual. The throng would push and perish and each would look to their own interest, but none would bear the individual moral responsibility. *That* would be absorbed by the State. What else was the State for?

But it turned out that his compatriots no longer thought this way. They had become soft since the fall of communism, soft and scared. When the moment came, they didn’t stand proud and sink with the ship. They scrambled over one another like a bunch of terrified mice, making a horrid mess and spitting on the morals of their homeland and a thousand years of national dignity just to buy a few more precious moments of lives clearly not worth living. They disgusted him. He would die the last true Pole.

In the meantime, he would carry on — his duty now to the species. Part of him felt that if *his* world had perished, so too should all the others. He harbored a certain resentment when he imagined some American scientists discovering the answer just in time to save their own country. It would be *his* data that accomplished this. What right had they to save themselves using *his* data, when his own people had perished. Yet still he sent it. Data that perhaps would one day allow another world to grow from the ashes of his. Maybe this was a sign that there *had* been some small progress over the thousands of years, that he was first and foremost human.

Alex’s thoughts were interrupted by a soft voice.

“We’re almost over Rome,” Ying whispered, breathing gently into his ear.

“C’mon, I have to record this,” he protested in half-genuine exasperation.

“That’s ok, we’ll just catch the next pass,” she shot back from behind him.

Alex heard some shuffling and felt something strange on his shoulder. What was Ying doing now? He had to focus, dammit. She was the funnest, craziest woman he had known, but sometimes he just wished he could lock her outside the station for a few hours. Yeah, he’d probably ask her to marry him at some point. Maybe soon. After all, living with somebody on the ISS was ten times more difficult than being married. Alex shook his shoulder free of her grip. It would have to wait.

Then he noticed that she wasn’t touching him. She was on the other side of the room, pointing at him with her mouth open. Why was there no sound? Then he was screaming, then he couldn’t scream anymore. Before things grew dark, he saw Ying’s decaying flesh. She still was pointing, almost like a mannequin. His last thought was how disgusting Ying had become, and that he soon would be the same.

CCSearch State Space Algo

While toying with automated Fantasy Sports trading systems, I ended up designing a rapid state search algorithm that was suitable for a variety of constrained knapsack-like problems.

A reference implementation can be found on github: https://github.com/kensmosis/ccsearch.

Below is a discussion of the algorithm itself. For more details, see the source code in the github repo. Also, please let me know if you come across any bugs! This is a quick and dirty implementation.

Here is a description of the algorithm:

— Constrained Collection Search Algorithm —

Here, we discuss a very efficient state-space search algorithm which originated with a Fantasy Sports project but is applicable to a broad range of applications. We dub it the Constrained Collection Search Algorithm for want of a better term. A C++ implementation, along with Python front-end is included as well.

In the Fantasy Sports context, our code solves the following problem: We’re given a tournament with a certain set of rules and requirements, a roster of players for that tournament (along with positions, salaries and other info supplied by the tournament’s host), and a user-provided performance measure for each player. We then search for those teams which satisfy all the constraints while maximizing team performance (based on the player performances provided). We allow a great deal of user customization and flexibility, and currently can accommodate (to our knowledge) all major tournaments on Draftkings and FanDuel. Through aggressive pruning, execution time is minimized.

As an example, on data gleaned from some past Fantasy Baseball tournaments, our relatively simple implementation managed to search a state space of size approximately {10^{21}} unconstrained fantasy teams, ultimately evaluating under {2} million plausible teams and executing in under {4} seconds on a relatively modest desktop computer.

Although originating in a Fantasy Sport context, the CCSearch algorithm and code is quite general.

— Motivating Example —

We’ll begin with the motivating example, and then consider the more abstract case. We’ll also discuss some benchmarks and address certain performance considerations.

In Fantasy Baseball, we are asked to construct a fantasy “team” from real players. While the details vary by platform and tournament, such games share certain common elements:

  • A “roster”. This is a set of distinct players for the tournament.
  • A means of scoring performance of a fantasy team in the tournament. This is based on actual performance by real players in the corresponding games. Typically, each player is scored based on certain actions (perhaps specific to their position), and these player scores then are added to get the team score.
  • For each player in the roster, the following information (all provided by the tournament host except for the predicted fantasy-points, which generally is based on the user’s own model of player performance):
  • A “salary”, representing the cost of inclusion in our team.
  • A “position” representing their role in the game.
  • One or more categories they belong to (ex. pitcher vs non-pitcher, real team they play on).
  • A prediction of the fantasy-points the player is likely to score.
  • A number {N} of players which constitute a fantasy team. A fantasy team must have precisely this number of players.
  • A salary cap. This is the maximum sum of player salaries we may “spend” in forming a fantasy team. Most, but not all, tournaments have one.
  • A set of positions, and the number of players in each. The players on our team must adhere to these. For example, we may have {3} players from one position and {2} from another and {1} each from {4} other positions. Sometimes there are “flex” positions, and we’ll discuss how to accommodate those as well. The total players in all the positions must sum to {N}.
  • Various other constraints on team formation. These come in many forms and we’ll discuss them shortly. They keep us from having too many players from the same real-life team, etc.

    To give a clearer flavor, let’s consider a simple example: Draftkings Fantasy Baseball. There are at least 7 Tournament types listed (the number and types change with time, so this list may be out of date). Here are some current game types. For each, there are rules for scoring the performance of players (depending on whether hitter or pitcher, and sometimes whether relief or starting pitcher — all of which info the tournament host provides):

    • Classic: {N=10} players on a team, with specified positions P,P,C,1B,2B,3B,SS,OF,OF,OF. Salary cap is $50K. Constraints: (1) {\le 5} hitters (non-P players) from a given real team, and (2) players from {\ge 2} different real games must be present.
    • Tiers: {N} may vary. A set of performance “tiers” is provided by the host, and we pick one player from each tier. There is no salary cap, and the constraint is that players from {\ge 2} different real games must be present.
    • Showdown: {N=6} players, with no position requirements. Salary cap is $50K. Constraints: (1) players from {\ge 2} different real teams, and (2) {\le 4} hitters from any one team.
    • Arcade: {N=6} players, with 1 pitcher, 5 hitters. Salary cap is $50K. Constraints are: (1) {\le 3} hitters (non-P players) from a given real team, and (2) players from {\ge 2} different real games must be present.
    • Playoff Arcade: {N=7} players, with 2 pitchers and 5 hitters. Salary cap is $50K. Constraints are: (1) {\le 3} hitters (non-P players) from a given real team, and (2) players from {\ge 2} different real games must be present.
    • Final Series (involves 2 games): {N=8} players, with 2 pitchers and 6 hitters. $50K salary cap. Constraints are: (1) {1} pitcher from each of the two games, (2) {3} hitters from each of the the {2} games, (3) can’t have the same player twice (even if they appear in both games), and (4) Must have hitters from both teams in each game.

    • Lowball: Same as Tiers, but the lowest score wins.

    Although the constraints above may seem quite varied, we will see they fall into two easily-codified classes.

    In the Classic tournament, we are handed a table prior to the competition. This contains a roster of available players. In theory there would be 270 (9 for each of the 30 teams), but not every team plays every day and there may be injuries so it can be fewer in practice. For each player we are given a field position (P,C,1B,2B,3B,SS,or OF), a Fantasy Salary, their real team, and which games they will play in that day. For our purposes, we’ll assume they play in a single game on a given day, though it’s easy to accommodate more than one.

    Let us suppose that we have a model for predicting player performance, and are thus also provided with a mean and standard deviation performance. This performance is in terms of “points”, which is Draftkings’ scoring mechanism for the player. I.e., we have a prediction for the score which Draftkings will assign the player using their (publicly available) formula for that tournament and position. We won’t discuss this aspect of the process, and simply take the predictive model as given.

    Our goal is to locate the fantasy teams which provide the highest combined predicted player scores while satisfying all the requirements (position, salary, constraints) for the tournament. We may wish to locate the top {L} such teams (for some {L}) or all those teams within some performance distance of the best.

    Note that we are not simply seeking a single, best solution. We may wish to bet on a set of 20 teams which diversify our risk as much as possible. Or we may wish to avoid certain teams in post-processing, for reasons unrelated to the constraints.

    It is easy to see that in many cases the state space is enormous. We could attempt to treat this as a knapsack problem, but the desire for multiple solutions and the variety of constraints make it difficult to do so. As we will see, an aggressively pruned direct search can be quite efficient.

    — The General Framework —

    There are several good reasons to abstract this problem. First, it is the sensible mathematical thing to do. It also offers a convenient separation from a coding standpoint. Languages such as Python are very good at munging data when efficiency isn’t a constraint. However, for a massive state space search they are the wrong choice. By providing a general wrapper, we can isolate the state-space search component, code it in C++, and call out to execute this as needed. That is precisely what we do.

    From the Fantasy Baseball example discussed (as well as the variety of alternate tournaments), we see that the following are the salient components of the problem:

    • A cost constraint (salary sum)
    • The number of players we must pick for each position
    • The selection of collections (teams) which maximize the sum of player performances
    • The adherence to certain constraints involving player features (hitter/pitcher, team, game)

    Our generalized tournament has the following components:

    • A number of items {N} we must choose. We will term a given choice of {N} items a “collection.”
    • A total cost cap for the {N} items.
    • A set of items, along with the following for each:
      • A cost
      • A mean value
      • Optionally, a standard deviation value
    • A set of features. Each feature has a set of values it may take, called “groups” here. For each feature, a table (or function) tells us which group(s), if any, each item is a member of. If every item is a member of one and only one group, then that feature is termed a “partition” for obvious reasons.
    • A choice of “primary” feature, whose role will be discussed shortly. The primary feature need not be a partition. Associated with the primary feature is a count for each group. This represents the number of items which must be selected for that group. The sum of these counts must be {N}. An item may be chosen for any primary group in which it is a member, but may not be chosen twice for a given collection.
    • A set of constraint functions. Each takes a collection and, based on the information above, accepts or rejects it. We will refer to these as “ancillary constraints”, as opposed to the overall cost constraint, the primary feature group allocation constraints, and the number of items per collection constraint. When we speak of “constraints” we almost always mean ancillary constraints.

    To clarify the connection to our example, the fantasy team is a collection, the players are items, the cost is the salary, the value is the performance prediction, the primary feature is “position” (and its groups are the various player positions), other features are “team” (whose groups are the 30 real teams), “game” (whose groups are the real games being played that day), and possibly one or two more which we’ll discuss below.

    Note that each item may appear only once in a given collection even if they theoretically appear can fill multiple positions (ex. they play in two games of a double-header or they are allowed for a “flex” position as well as their actual one in tournaments which have such things).

    Our goal at this point will be to produce the top {L} admissible collections by value (or a good approximation thereof). Bear in mind that an admissible collection is a set of items which satisfy all the criteria: cost cap, primary feature group counts, and constraint functions. The basic idea is that we will perform a tree search, iterating over groups in the primary feature. This is why that group plays a special role. However, its choice generally is a structural one dictated by the problem itself (as in Fantasy Baseball) rather than a control lever. We’ll aggressively prune where we can based on value and cost as we do so. We then use the other features to filter the unpruned teams via the constraint functions.

    It is important to note that features need not be partitions. This is true even of the primary feature. In some tournaments, for example, there are “utility” or “flex” positions. Players from any other position (or some subset of positions) are allowed for these. A given player thus could be a member of one or more position groups. Similarly, doubleheaders may be allowed, in which case a player may appear in either of 2 games. This can be accommodated via a redefinition of the features.

    In most cases, we’ll want the non-primary features to be partitions if possible. We may need some creativity in defining them, however. For example, consider the two constraints in the Classic tournament described above. Hitter vs pitcher isn’t a natural feature. Moreover, the constraint seems to rely on two distinct features. There is no rule against this, of course. But we can make it a more efficient single-feature constraint by defining a new feature with 31 groups: one containing all the pitchers from all teams, and the other 30 containing hitters from each of the 30 real teams. We then simply require that there be no more than 5 items in any group of this new feature. Because only 2 pitchers are picked anyway, the 31st group never would be affected.

    Our reference implementation allows for general user-defined constraints via a functionoid, but we also provide two concrete constraint classes. With a little cleverness, these two cover all the cases which arise in Fantasy Sports. Both concern themselves with a single feature, which must be a partition:

    • Require items from at least {n} groups. It is easy to see that the {\ge 2} games and {\ge 2} teams constraints fit this mold.
    • Allow at most {n} items from a given group. The {\le 3,4,5} hitter per team constraints fit this mold.

    When designing custom constraints, it is important to seek an efficient implementation. Every collection which passes the primary pruning will be tested against every constraint. Pre-computing a specialized feature is a good way to accomplish this.

    — Sample Setup for DraftKings Classic Fantasy Baseball Tournament —

    How would we configure our system for a real application? Consider the Classic Fantasy Baseball Tournament described above.

    The player information may be provided in many forms, but for purposes of exposition we will assume we are handed vectors, each of the correct length and with no null or bad values. We are given the following:

    • A roster of players available in the given tournament. This would include players from all teams playing that day. Each team would include hitters from the starting lineup, as well as the starting pitcher and one or more relief pitchers. We’ll say there are {M} players, listed in some fixed order for our purposes. {R_i} denotes player {i} in our listing.
    • A set {G} of games represented in the given tournament. This would be all the games played on a given day. Almost every team plays each day of the season, so this is around 15 games. We’ll ignore the 2nd game of doubleheaders for our purposes (so a given team and player plays at most once on a given day).
    • A set {T} of teams represented in the given tournament. This would be all 30 teams.
    • A vector {p} of length {M}, identifying the allowed positions of each player. These are P (pitcher), C (catcher), 1B (1st base), 2B (2nd base), 3B (3rd base), SS (shortstop), OF (outfield).
    • A vector {t} of length {M}, identifying the team of each player. This takes values in {T}.
    • A vector {g} of length {M}, identifying the game each player participates in that day. This takes value in {G}.
    • A vector {s} of length {M}, providing the fantasy salary assigned by DraftKings to each player (always positive).
    • A vector {v} of length {M}, providing our model’s predictions of player performance. Each such value is the mean predicted fantasy score for the player under DraftKing’s scoring system for that tournament and player position. As an aside, DK never scores pitchers as hitters even if they bat.

    Note that DraftKings provides all this info (though they may have to be munged into some useable form), except the model prediction.

    We now define a new vector {h} of length {M} as follows: {h_i=t_i} if player {i} is a hitter (i.e. not a pitcher), and {h_i=P} if a pitcher, where {P} designates some new value not in {T}.

    Next, we map the values of {G}, {T}, and the positions into nonnegative consecutive integers (i.e. we number them). So the games run from {1\dots |G|}, the teams from {1\dots |T|}, and the positions from {1\dots 7}. We’ll assign {0} to the pitcher category in the {h} vector. The players already run from {1\dots M}. The vectors {t}, {g}, {s}, and {h} now take nonnegative integer values, while {s} and {v} take real ones (actually {s} is an integer too, but we don’t care here).

    From this, we pass the following to our algorithm:

    • Number of items: {M}
    • Size of a collection: {10}
    • Feature 1: {7} groups (the positions), and marked as a partition.
    • Feature 2: {|T|} groups (the teams), and marked as a partition.
    • Feature 3: {|G|} groups (the games), and marked as a partition.
    • Feature 4: {|T|+1} groups (the teams for hitters plus a single group of all pitchers), and marked as a partition.
    • Primary Feature: Feature 1
    • Primary Feature Group Counts: {(2,1,1,1,1,1,3)} (i.e. P,P,C,1B,2B,3B,SS,OF,OF,OF)
    • Item costs: {s}
    • Item values: {v}
    • Item Feature 1 Map: {f(i,j)= \delta_{p_i,j}} (i.e. {1} if player {i} is in position {j})
    • Item Feature 2 Map: {f(i,j)= \delta_{t_i,j}} (i.e. {1} if player {i} is on team {j})
    • Item Feature 3 Map: {f(i,j)= \delta_{g_i,j}} (i.e. {1} if player {i} is in game {j})
    • Item Feature 4 Map: {f(i,j)= \delta_{h_i,j}} (i.e. {1} if player {i} is a hitter on team {j} or a pitcher and {j=0})
    • Cost Cap: {50,000}
    • Constraint 1: No more than {5} items in any one group of Feature 4. (i.e. {\le 5} hitters from a given team)
    • Constraint 2: Items from at least {2} groups of Feature 3. (i.e. items from {\ge 2} games)

    Strictly speaking, we could have dispensed with Feature 2 in this case (we really only need the team through Feature 4), but we left it in for clarity.

    Note that we also would pass certain tolerance parameters to the algorithm. These tune its aggressiveness as well as the number of teams potentially returned.

    — Algorithm —

    — Culling of Individual Items —

    First, we consider each group of the primary feature and eliminate strictly inferior items. These are items we never would consider picking because there always are better choices. For this purpose we use a tolerance parameter, {epsilon}. For a given group, we do this as follows. Assume that we are required to select {n} items from this group:

    • Restrict ourselves only to items which are unique to that group. I.e., if an item appears in multiple groups it won’t be culled.
    • Scan the remaining items in descending order of value. For item {i} with cost {c} and value {v},
      • Scan over all items {j} with {v_j>v_i(1+\epsilon)}
      • If there are {n} such items that have {c_j\le c_i} then we cull item {i}.

    So basically, it’s simple comparison shopping. We check if there are enough better items at the same or lower cost. If so, we never would want to select the item. We usually don’t consider “strictly” better, but allow a buffer. The other items must be sufficiently better. There is a rationale behind this which will be explained shortly. It has to do with the fact that the cull stage has no foreknowledge of the delicate balance between ancillary constraints and player choice. It is a coarse dismissal of certain players from consideration, and the tolerance allows us to be more or less conservative in this as circumstance dictates.

    If a large number of items appear in multiple groups, we also can perform a merged pass — in which those groups are combined and we perform a constrained cull. Because we generally only have to do this with pairs of groups (ex. a “flex” group and each regular one), the combinatorial complexity remains low. Our reference implementation doesn’t include an option for this.

    To see the importance of the initial cull, consider our baseball example but with an extra 2 players per team assigned to a “flex” position (which can take any player from any position). We have {8} groups with ({60},{30},{30},{30},{30},{30},{90},{270}) allowed items. We need to select {(2,1,1,1,1,1,3,2)} items from amongst these. In reality, fantasy baseball tournaments with flex groups have fewer other groups — so the size isn’t quite this big. But for other Fantasy Sports it can be.

    The size of the overall state space is around {5x10^{21}}. Suppose we can prune just 1/3 of the players (evenly, so 30 becomes 20, 60 becomes 40, and 90 becomes 60). This reduces the state space by 130x to around {4x10^19}. If we can prune 1/2 the players, we reduce it by {4096x} to around {10^18}. And if we can prune it by 2/3 (which actually is not as uncommon as one would imagine, especially if many items have {0} or very low values), we reduce it by {531441x} to a somewhat less unmanageable starting point of {O(10^{16})}.

    Thus we see the importance of this initial cull. Even if we have to perform a pairwise analysis for a flex group — and each paired cull costs {n^2m^2} operations (it doesn’t), where {m} is the non-flex group size and {n} is the flex-group size, we’d at worst get {(\sum_i m_i)^2\sum_i m_i^2} which is {O(10^9)} operations. In reality it would be far lower. So a careful cull is well worth it!

    One important word about this cull, however. It is performed at the level of individual primary-feature groups. While it accommodates the overall cost cap and the primary feature group allocations, it has no knowledge of the ancillary constraints. It is perfectly possible that we cull an item which could be used to form the highest value admissible collection once the ancillary constraints are taken into account. This is part of why we use the tolerance {\epsilon}. If it is set too high, we will cull too few items and waste time down the road. If it is too low, we may run into problems meeting the ancillary constraints.

    We note that in fantasy sports, the ancillary constraints are weak in the sense that they affect a small set of collections and these collections are randomly distributed. I.e, we would have to conspire for them to have a meaningful statistical effect. We also note that there tend to be many collections within the same tiny range of overall value. Since the item value model itself inherently is statistical, the net effect is small. We may miss a few collections but they won’t matter. We’ll have plenty of others which are just as good and are as statistically diverse as if we included the omitted ones.

    In general use, we may need to be more careful. If the ancillary constraints are strong or statistically impactful, the initial cull may need to be conducted with care. Its affect must be measured and, in the worst case, it may need to be restricted or omitted altogether. In most cases, a well-chosen {\epsilon} will achieve the right compromise.

    In practice, {\epsilon} serves two purposes: (1) it allows us to tune our culling so that the danger of an impactful ommission due to the effect of ancillary, yet we still gain some benefit from this step, and (2) it allows us to accommodate “flex” groups or other non-partition primary features without a more complicated pairwise cull. This is not perfect, but often can achieve the desired effect with far less effort.

    Another approach to accommodating flex groups or avoiding suboptimal results due to the constraints is to require more than the selection count when culling in a given group. Suppose we need to select {2} items from a given group. Ordinarily, we would require that there be at least {2} items with value {(1+\epsilon)v} and cost {\le c} in order to cull an item with value {v} and cost {c}. We could buffer this by requiring {3} or even {4} such better items. This would reduce the probability of discarding useful items, but at the cost of culling far fewer. In our code, we use a parameter {ntol} to reflect this. If {n_i} is the number of selected items for group {i} (and the number we ordinarily would require to be strictly better in order to cull others), we now require {n_i+ntol} strictly better items. Note that {ntol} solely is used for the individual cull stage.

    One final note. If a purely lossless search is required then the individual cull must be omitted altogether. In the code this is accomplished by either choosing {ntol} very high or {\epsilon} very high. If we truly require the top collection (as opposed to collections within a thick band near the top), we have the standard knapsack problem and there are far better algorithm than CCSearch.

    — Prepare for Search —

    We can think of our collection as a selection of {n_i} items from each primary-feature group {i} (we’ll just refer to it as “group” for short). Let’s say that {m_i} is the total number of items in the {i^{th}} group. Some of the same items may be available to multiple groups, but our collection must consist of distinct items. So there are {K} bins, the number of primary feature groups. For the {i^{th}} such group, we select {n_i} items from amongst the available {m_i} post-cull items.

    For the search itself we iterate by group, then within each group. Conceptually, this could be thought of as a bunch of nested loops from left group to right group. In practice, it is best implemented recursively.

    We can precompute certain important information:

    • Each group has {C_i= {m_i\choose n_i}} possible selections. We can precompute this easily enough.
    • We also can compute {RC_i= \Pi_{j\ge i} C_i}. I.e. the product of the total combinations of this group and those that come after.
    • {BV_i} is the sum of the top {n_i} values in the group. This is the best we can do for that group, if cost is no concern.
    • {RBV_i} is {\sum_{j>i} BV_i}. I.e., the best total value we can get from all subsequent groups.
    • {LC_i} is the sum of the bottom {n_i} costs in the group. This is the cheapest we can do for that group, if value is no concern.
    • {RLC_i} is {\sum_{j>i} LC_i}. I.e., the cheapest we can do for all subsequent groups, if value is no concern.
    • Sorted lists of the items by value and by cost.
    • Sorted lists of {n_i}-tuples of distinct items by overall value and by overall cost. I.e., for each group, sorted lists of all combos of {n_i} choices. These generally are few enough to keep in memory.

    The search itself depends on two key iteration decisions. We discuss their effects on efficiency below.

    • Overall, do we scan the groups from fewest to most combinations (low to high {C_i}) or from most to fewest (high to low {C_i})?
    • Within each group, do we scan the items from lowest to highest cost or from highest to lowest value. Note that of the four possible combinations, the other two choices make no sense. It must be one of these.

    Based on our choice, we sort our groups, initialize our counters, and begin.

    — Search —

    We’ll describe the search recursively.

    Suppose we find ourselves in group {i}, and are given the cost {c} and value {v} so far (from the selections for groups {1\dots i-1}). We also are given {vmin}, the lowest collection value we will consider. We’ll discuss how this is obtained shortly.

    We need to cycle over all {C_i} choices for group {i}. We use the pre-sorted list of {n_i-tuples} sorted by value or by cost depending on our 2nd choice above. I.e., we are iterating over the possible selections of {n_i} items in decreasing order of overall value or increasing order of overall cost.

    We now discuss the individual iteration. For each step we compute the following:

    • {mc} is the minimum cost of all remaining groups ({i+1} onward). This is the lowest cost we possibly could achieve for subsequent groups. It is the pre-computed {RLC_i} from above.
    • {mv} is the maximum value of all remaining groups ({i+1} onward). This is the highest value we possibly could achieve for subsequent groups. It is the pre-computed {RBV_i} from above.
    • {c_i} is the cost of our current selection for group {i}
    • {v_i} is the value of our current selection for group {i}

    Next we prune if necessary. There are 2 prunings, the details of which depend on the type of iteration.

    If we’re looping in increasing order of cost:

    • If {c+c_i+mc>S} then there is no way to select from the remaining groups and meet the cost cap. Worse, all remaining iterations within group {i} will be of equal or higher cost and face the same issue. So we prune both the current selection and all remaining ones. Practically, this means we terminate the iteration over combinations in group {i} (for this combo of prior groups).
    • If {v+v_i+mv<vmin} then there is no way to select a high enough value collection from the remaining groups. However, it is possible that other iterations may do so (since we’re iterating by cost, not value). We prune just the current selection, and move on to the next combo in group {i} by cost.

    If on the other hand we’re looping in decreasing order of value, we do the opposite:

    • If {v+v_i+mv<vmin} then there is no way to select a high enough value collection from the remaining groups. Worse, all remaining iterations within group {i} will be of equal or lower value and face the same issue. So we prune both the current selection and all remaining ones. Practically, this means we terminate the iteration over combinations in group {i} (for this combo of prior groups).
    • If {c+c_i+mc>S} then there is no way to select from the remaining groups and meet the cost cap. However, it is possible that other iterations may do so (since we’re iterating by value, not cost). We prune just the current selection, and move on to the next combo in group {i} by value.

    If we get past this, our combo has survived pruning. If {i} isn’t the last group, we recursively call ourselves, but now with cost {c+c_i} and value {v+v_i} and group {i+1}.

    If on the other hand, we are the last group, then we have a completed collection. Now we must test it.

    If we haven’t put any protections against the same item appearing in different slots (if it is in multiple groups), we must test for this and discard the collection if it is. Finally, we must test it against our ancillary constraints. If it violates any, it must be discarded. What do we do with collections that pass muster? Well, that depends. Generally, we want to limit the number of collections returned to some number {NC}. We need to maintain a value-sorted list of our top collections in a queue-like structure.

    If our new collection exceeds all others in value, we update {vmax}, the best value realized, This also resets {vmin= vmax (1-\delta)} for some user-defined tolerance {\delta}. We then must drop any already-accumulated collections which fall below the new {vmin}.

    I.e., we keep at most {NC} collections, and each must have value within a fraction {\delta} of the best.

    And that’s it.

    — Tuning —

    Let’s list all the user-defined tunable parameters and choices in our algorithm:

    • What is the individual cull tolerance {\epsilon\in [0,\infty]}?
    • What is {ntol}, the number of extra strictly-better items we require in a group during the individual cull?
    • Do we scan the groups from fewest to most combinations or the other way?
    • Within each group, do we scan the items from lowest to highest cost or from highest to lowest value?
    • What is the maximum number of collections {NC>0} we report back (or do we keep them all)?
    • What is the collection value tolerance {\delta\in [0,1]}?

    Clearly, {NC} and {\delta} guide how many results are kept and returned. High {NC} and high {\delta} are burdensome in terms of storage. If we want just the best result, either {NC=1} or {\delta=1} will do. As mentioned, {\epsilon} and {ntol} have specific uses related to the behavior of the individual cull. What about the sort orders?

    The details of post-cull search performance will depend heavily on the primary partition structure and cost distribution, as well as our 2 search order choices. The following is a simple test comparison benchmark (using the same data and the {10}-player collection Classic Fantasy Baseball tournament structure mentioned above).

    GroupOrder CombosOrder Time Analyzed
    Value high-to-low high-to-low 12.1s 7.9MM
    Value high-to-low low-to-high 3.4s 1.5MM
    Cost low-to-high high-to-low 69.7s 47.5MM
    Cost low-to-high low-to-high 45.7s 18.5MM

    Here, “Analyzed” refers to the number of collections which survived pruning and were tested against the ancillary constraints. The total number of combinations pruned was far greater.

    Of course, these numbers mean nothing in an absolute sense. They were run with particular test data on a particular computer. But the relative values are telling. For these particular conditions, the difference between the best and worst choice of search directions was over {20x}. There is good reason to believe that, for any common tournament structure, the results would be consistent once established. It also is likely they will reflect these. Why? The fastest option allows the most aggressive pruning early in the process. That’s why so few collections needed to be analyzed.

  • Two-Envelope Problems

    Let’s visit a couple of fun and extremely counterintuitive problems which sit in the same family. The first appears to be a “paradox,” and illustrates a subtle fallacy. The second is an absolutely astonishing (and legitimate) algorithm for achieving better than 50-50 oods of picking the higher of two unknown envelopes. Plenty of articles have discussed who discovered what ad nauseum so we’ll just dive into the problems.

    — The Two Envelope Paradox: Optimizing Expected Return —

    First, consider the following scenario. Suppose you are shown two identical envelopes, each containing some amount of money unknown to you. You are told that one contains double the money in the other (but not which is which or what the amounts are) and are instructed to choose one. The one you select is placed in front of you and its contents are revealed. You then are given a second choice: keep it or switch envelopes. You will receive the amount in the envelope you choose. Your goal is to maximize your expected payment.

    Our intuition tells us that no information has been provided by opening the envelope. After all, we didn’t know the two values beforehand so learning one of them tells us nothing. The probability of picking the higher envelope should be {1/2} regardless of whether we switch or not. But you weren’t asked to improve on the probability, just to maximize your expected payment. Consider the following 3 arguments:

    • Let the amount in the the envelope you initially chose be {z}. If it is wrong to switch then the other envelope contains {z/2}, but if it is right to switch it contains {2z}. There are even odds of either, so your expectation if you switch is {1.25z}. This is better than the {z} you get by sticking with the initial envelope, so it always is better to switch!
    • Since we don’t know anything about the numbers involved, opening the first envelope gives us no information — so ignore that value. Call the amount in the other envelope {z'}. If it is wrong to switch then the envelope you chose contains {2z'}, and if right to switch it contains {0.5z'}. If you switch, you get {z'} but if you don’t your expectation is {1.25z'}. So it always is better NOT to switch!
    • Call the amounts in the two envelopes {x} and {2x} (though you don’t know which envelope contains which). You pick one, but there is equal probability of it being either {x} or {2x}. The expected reward thus is {1.5x}. If you switch, the same holds true for the other envelope. So you still have an expected reward of {1.5x}. It doesn’t matter what you do.

    Obviously, something is wrong with our logic. One thing that is clear is that we’re mixing apples and oranges with these arguments. Let’s be a bit more consistent with our terminology. Let’s call the value that is in the opened envelope {z} and the values in the two envelopes {x} and {2x}. We don’t know which envelope contains each, though. When we choose the first envelope, we observe a value {z}. This value may be {x} or {2x}.

    In the 3rd argument, {P(z=x)= P(z=2x)= 0.5}. If we switch, then {\langle V \rangle= P(z=x)2x+P(z=2x)x = 1.5x}. If we keep the initial envelope then {\langle V \rangle= P(z=x)x+P(z=2x)2x = 1.5x}. Whether we switch or not, the expected value is {1.5x} though we do not know what this actually is. It could correspond to {1.5z} or {0.75z}. We must now draw an important distinction. It is correct that {P(z=x)= P(z=2x)= 0.5} for the known {z} and given our definition of {x} as the minimum of the two envelopes. However, we cannot claim that {1.5x} is {1.5z} or {0.75z} with equal probability! That would be tantanmount to claiming that the envelopes contain the pairs {(z/2,z)} or {(z,2z)} with equal probability. We defined {x} to be the minimum value so the first equality holds, but we would need to impose a constraint on the distribution over that minimum value itself in order for the second one to hold. This is a subtle point and we will return to it shortly. Suffice it to say that if we assume such a thing we are led right to the same fallacy the first two arguments are guilty of.

    Obviously, the first two arguments can’t both be correct. Their logic is the same and therefore they must both be wrong. But how? Before describing the problems, let’s consider a slight variant in which you are NOT shown the contents of the first envelope before being asked to switch. It may seem strange that right after you’ve chosen, you are given the option to switch when no additional information has been presented. Well, this really is the same problem. With no apriori knowledge of the distribution over {x}, it is immaterial whether the first envelope is opened or not before the 2nd choice is made. This gives us a hint as to what is wrong with the first two arguments.

    There actually are two probability distributions at work here, and we are confounding them. The first is the underlying distribution on ordered pairs or, equivalently, the distribution of the lower element {x}. Let us call it {P(x)}. It determines which two numbers {(x,2x)} we are dealing with. We do not know {P(x)}.

    The second relevant distribution is over how two given numbers (in our case {(x,2x)}) are deposited in the envelopes (or equivalently, how the player orders the envelopes by choosing one first). This distribution unambiguously is 50-50.

    The problem arises when we implicitly assume a form for {P(x)} or attempt to infer information about it from the revealed value {z}. Without apriori knowledge of {P(x)}, being shown {z} makes no difference at all. Arguments which rely solely on the even-odds of the second distribution are fine, but arguments which implicitly involve {P(x)} run into trouble.

    The first two arguments make precisely this sort of claim. They implicitly assume that the pairs {(z/2,z)} or {(z,2z)} can occur with equal probability. Suppose they couldn’t. For simplicity (and without reducing the generality of the problem), let’s assume that the possible values in the envelopes are constrained to {2^n} with {n\in Z}. The envelopes thus contain {(2^n,2^{n+1})} for some integer {n} (though we don’t know which envelope contains which value). For convenience, let’s work in terms of {log_2} of the values involved (taking care to use {2^n} when computing expectations).

    In these terms, the two envelopes contain {(n,n+1)} for some {n=\log_2(x)} (defined to be the lesser of the two). We open one, and see {m=\log_2(z)}. If it is the upper then the pair is {(m-1,m)}, otherwise the pair is {(m,m+1)}. To claim that these have equal probabilities means that {n=m-1} and {n=m} are equally probable. We made this assumption independent of the value of {m}, so it would require that all pairs {(n,n+1)} be equally probable.

    So what? Why not just assume a uniform distribution? Well, for one thing, we should be suspicious that we require an assumption about {P(x)}. The 3rd argument requires no such assumption. Even if we were to assume a form for {P(x)}, we can’t assume it is uniform. Not just can’t as in “shouldn’t”, but can’t as in “mathematically impossible.” It is not possible to construct a uniform distribution on {Z}.

    Suppose we sought to circumvent this issue by constraining ourselves to some finite range {[M,N]}, which we supposedly know or assume apriori. We certainly can impose a uniform distribution on it. Each pair {(n,n+1)} has probability {1/(N-M-1)} with {n\in [M,N-1]}. But now we’ve introduced additional information (in the form of {N} and {M}), and it no longer is surprising that we can do better than even-odds! We always would switch unless the first envelope contained {N}. There is no contradiction between the first two arguments because we have apriori knowledge and are acting on it. We no longer are true to the original game.

    Rather than dwell on this particular case, let’s solve the more general case of a given {P(x)} (or in terms of {log_2}, {P(n)}). For any {n} drawn according to {P(n)}, the envelopes contain {(n,n+1)} in some order and it is equally likely that {m=n} and {m=n+1}. If we know {P} we can bet accordingly since it contains information. In that case, knowing {m} (i.e. {z}) helps us. Let’s suppose we don’t know {P}. Then it still does not matter whether we observe the value {z}, because we don’t the know the underlying distribution!

    There only are two deterministic strategies: always keep, always switch. Why? Suppose that the drawn value is {n} (unknown to us) and the observed value is {m}. Note that these don’t require actual knowledge of the {m} value, just that it has been fixed by the process of opening the envelope. Since we don’t know the underlying distribution, our strategy will be independent of the actual value. Given that the value doesn’t matter, we have nothing to do but always keep or always switch.

    First consider the expected value with the always-keep strategy:

    \displaystyle \langle V_K \rangle= \sum_{n=-\infty}^\infty P(n) [P(m=n|n) 2^n + P(m=n+1|n) 2^{n+1}]

    I.e. we sum over all possible ordered pairs {(n,n+1)} and then allow equal probability {P(m=n+1|n)=P(m=n|n)=0.5} for either of the two envelope orders. So we have {\langle V_K \rangle= \sum P(n) (2^n+2^{n+1})/2 = 3 \langle 2^{n-1} \rangle}. We immediately see that for this to be defined the probability distribution must drop faster than {2^n} as {n} gets large! We already have a constraint on the possible forms for {P}.

    Next consider the always-switch strategy. It’s easy to see that we get the same result:

    \displaystyle \langle V_S \rangle= \sum_{n=-\infty}^\infty P(n) [P(m=n|n) 2^{n+1} + P(m=n+1|n) 2^{n}]

    and since {P(m=n|n)= P(m=n+1|n)} we get the same answer.

    But let’s be extra pedantic, and connect this to the original formulation of the first two arguments. I.e., we should do it in terms of {m}, the observed value.

    \displaystyle \langle V_S \rangle= \sum_m P(m) [P(n=m|m) 2^{m+1} + P(n=m-1|m) 2^{m-1}]

    We observe that {P(n=m|m)= P(m|n=m)P(n=m)/P(m)} and {P(n=m-1|m)= P(m|n=m-1)P(n=m-1)/P(m)}. We know that {P(m|n=m)= P(m|n=m-1)= 0.5}. Plugging these in, we get

    \displaystyle \langle V_S \rangle= \sum_m [0.5 P(n=m) 2^{m+1} + 0.5 P(n=m-1) 2^{m-1}]

    The first term gives us {\sum_n P(n) 2^n}. We can rewrite the index on the 2nd sum to get {\sum_n P(n) 2^{n-1}}, which gives us {\langle V_S \rangle= \sum_m P(n) (2^n + 2^{n-1})}, the exact same expression as before!

    How does this apply to the {[M,N]} ranged example we gave before? When we discussed it, we considered the case where the underlying distribution was known. In that and all other cases, a better than even-odds strategy based on such knowledge can be computed. In our actual formulation of the game, we don’t know {P(n)} and there’s no reason it couldn’t be uniform on some unknown interval {[M,N]}. Suppose it was. It still seems from our earlier discussion as if we’d do better by always switching. We don’t. The average amount thrown away by incorrectly switching when {m=N} exactly offsets the average gain from switching in all other cases. We do no better by switching than by keeping.

    We thus see that without knowing the underlying distribution {P(x)}, the switching and keeping strategies have the same expected reward. Of the three arguments we originally proposed, the first 2 were flawed in that they assume a particular, and impossible, underlying distribution for {x}.

    At the beginning of our discussion, we mentioned that our intuition says you cannot do better than 50-50 probability-wise. Let us set aside expected rewards and focus solely on probabilities. We now see how you actually can do better than 50-50, contrary to all intuition!

    — Achieving better than 50-50 Odds with Two Envelopes —

    Next let’s consider a broader class of two-envelope problems, but purely from the standpoint of probabilities. Now the two envelopes can contain any numbers; one need not be double the other. As before, we may choose an envelope, it is opened, and we are offered the opportunity to keep it or switch. Unlike before, our goal now is to maximize the probability of picking the larger envelope.

    Since we are dealing with probabilities rather than expectation values, we don’t care what two numbers the envelopes contain. In fact, they need not be numbers at all — as long as they are distinct and comparable (i.e. {a<b} or {b<a} but not both). To meaningfully analyze the problem we require a slightly stronger assumption, though: specifically that the set from which they be drawn (without repetition) possesses a strict linear ordering. However, it need not even possess any algebraic structure or a metric. Since we are not concerned with expectation values, no such additional structure is necessary.

    Our intuition immediately tells us that nothing can be gained by switching. In fact, nothing we do should have any impact on the outcome. After all, the probability of initially picking correctly is {1/2}. Switching adds no information and lands us with an identical {1/2} probability. And that is that, right? It turns out that, contrary to our very strong intuition about the problem, there is in fact a way to improve those odds. To accomplish this, we’ll need to introduce a source of randomness. For convenience of exposition we’ll assume the envelopes contain real numbers, and revisit the degree to which we can generalize the approach later.

    The procedure is as follows:

    • Pick any continuous probability distribution {P} which has support on all of {R} (i.e. {p(x)>0} for all real {x}). Most common distributions (normal, beta, exponential, etc) are fine.
    • Choose an envelope and open it. We’ll denote its value {z}.
    • Sample some value {d} from our distribution {P}. If {z>d} stick with the initial choice, otherwise switch. We’ll refer to {z>d} or {z<d} because the probability that {z=d} has measure {0} and safely can be ignored.

    At first, second, and {n^{th}} glance, this seems pointless. It feels like all we’ve done is introduce a lot of cruft which will have no effect. We can go stand in a corner flipping a coin, play Baccarat at the local casino, cast the bones, or anything else we want, and none of that can change the probability that we’re equally likely to pick the lower envelope as the higher one initially — and thus equally likely to lose as to gain by switching. With no new information, there can be no improvement. Well, let’s hold that thought and do the calculation anyway. Just for fun.

    First some terminology. We’ll call the value in the opened envelope {z}, and the value in the other envelope {z'}. The decision we must make is whether to keep {z} or switch to the unknown {z'}. We’ll denote by {x} and {y} the values in the two envelopes in order. I.e., {x<y} by definition. In terms of {z} and {z'} we have {x= \min(z,z')} and {y= \max(z,z')}. We’ll denote our contrived distribution {P} in the abstract, with pdf {p(v)} and cdf {F(v)=\int_{-\infty}^v p(v') dv'}.

    Let’s examine the problem from a Bayesian perspective. There is a 50-50 chance that {(z,z')=(x,y)} or {(z,z')=(y,x)}. So {p(z=x)=p(z=y)=0.5}. There are no subtleties lurking here. We’ve assumed nothing about the underlying distribution over {(x,y)}. Whatever {(x,y)} the envelopes contain, we are equally likely to initially pick the one with {x} or the one with {y}.

    Once the initial envelope has been opened, and the value {z} revealed, we sample {d} from our selected distribution {P} and clearly have {p(d<x)=F(x)} and {p(d<y)=F(y)} and {p(d<z)=F(z)}. The latter forms the criterion by which we will keep {z} or switch to {z'}. Please note that in what follows, {d} is not a free variable, but rather a mere notational convenience. Something like {p(x<d)} is just notation for “the probability the sampled value is greater than {x}.” We can apply Bayes’ law to get (with all probabilities conditional on some unknown choice of {(x,y)}):

    \displaystyle p(z=x|d<z)= \frac{p(d<z|z=x)p(z=x)}{p(d<z)}

    What we really care about is the ratio:

    \displaystyle \frac{p(z=x | d<z)}{p(z=y | d<z)}= \frac{p(d<z|z=x)p(z=x)}{p(d<z|z=y)p(z=y)}= \frac{F(x)}{F(y)}<1

    Here, we’ve observed that {p(d<z|z=x)= p(d<x)= F(x)} and {F(x)<F(y)} since by assumption {x<y} and {F} is monotonically increasing (we assumed its support is all of {R}). I.e., if {d<z} there is a greater probability that {z=y} than {z=x}. We shouldn’t switch. A similar argument shows we should switch if {d>z}.

    So what the heck has happened, and where did the new information come from? What happened is that we actually know one piece of information we had not used: that the interval {(x,y)} has nonzero probability measure. I.e. there is some “space” between {x} and {y}. We don’t know the underlying distribution but we can pretend we do. Our strategy will be worse than if we did know the underlying {p(x)}, of course. We’ll return to this shortly, but first let’s revisit the assumptions which make this work. We don’t need the envelopes to contain real numbers, but we do require the following of the values in the envelopes:

    • The set of possible values forms a measurable set with a strict linear ordering.
    • Between any two elements there is a volume with nonzero probability. Actually, this only is necessary if we require a nonzero improvement for any {(x,y)}. If we only require an improvement on average we don’t need it. But in that scenario, the host can contrive to use a distribution which neutralizes our strategy and returns us to 50-50 odds.

    What difference does {P} itself make? We don’t have any way to choose an “optimal” distribution because that would require placing the bulk of probability where we think {x} and {y} are likely to lie. I.e. we would require prior knowledge. All we can guarantee is that we can improve things by some (perhaps tiny) amount. We’ll compute how much (for a given true underlying distribution) shortly.

    Let’s assume that {Q(x,y)} is the true underlying distribution over {(x,y)}. We won’t delve into what it means to “know” {Q} since we are handed the envelopes to begin with. Perhaps the game is played many times with values drawn according to {Q} or maybe it is a one-time affair with {(x,y)} fixed (i.e. {Q} a {\delta}-distribution). Ultimately, such considerations just would divert us to the standard core philosophical questions of probability theory. Suffice to say that there exists some {Q(x,y)}. By definition {Q(x,y)=0} unless {x<y}. For convenience, we’ll define a symmetrized version as well: {q(a,b)\equiv Q(a,b)+Q(b,a)}. We don’t employ a factor of {1/2} since the two terms are nonzero on disjoint domains.

    Given {Q}, what gain do we get from a particular choice of {P}?

    \displaystyle  \begin{array}{rcl}  P(win)= \int_{x<y} dx dy Q(x,y)[p(z=x|(x,y))p(x<d) \\ + p(z=y|(x,y))p(d<y)] \end{array}

    I.e., the probability we keep {z} when it is {y} and switch when it is {x}. Clearly, {p(z=x|(x,y))= p(z=y|(x,y))= 0.5} since those are the immutable 50-50 envelope ordering probabilities. After a little rearrangement, we get:

    \displaystyle P(win)= \frac{1}{2} + \langle F(y) - F(x) \rangle_Q

    Our gain is the mean value of {F(y)-F(x)} over the joint distribution {Q(x,y)}. The more probability {P} jams between {x} and {y}, the more we gain should that {(x,y)} arise. But without knowledge of the underlying joint distribution {Q(x,y)}, we have no idea how best to pick {P}. All we can do is guarantee some improvement.

    How well can we do if we actually know {Q}? Well, there are two ways to use such information. We could stick to our strategy and try to pick an optimal {P}, or we could seek to use knowledge of {Q} directly. In order to do the former, we need to exercise a little care. {Q} is a two-dimensional distribution while {P} is one-dimensional. How would we use {Q} to pick {P}? Well, this is where we make use of the observed {z}.

    In our previous discussion of the {(x,2x)} envelope switching fallacy, the value of {z} turned out to be a red-herring. Here it is not. Observing {z} is essential here, but only for computation of probabilities. As mentioned, we assume no algebraic properties and are computing no expectations. We already know that the observation of {z} is critical, since our algorithm pivots on a comparison between {z} and our randomly sampled value {d}. Considering our ultimate goal (keep or switch), it is clear what we need from {Q}: a conditional probability that {z'>z}. However, we cannot directly use {Q(y|x)} because we defined {x<y}. We want {p(z'|z)} and we don’t know whether {z<z'} or {z'<z}. Let’s start by computing the probability of {z} (being the observed value) and of {z,z'} (being the observed and unobserved values).

    The probability of observing {z} and the other envelope having {z'} is the probability that the relevant ordered pair was chosen for the two envelopes multiplied by the {1/2} probability that we initially opened the envelop containing the value corresponding to our observed {z} rather than the other one.

    \displaystyle p(z,z')= Q(min(z,z'),max(z,z'))/2= q(z,z')/2

    To get {p(z)} we integrate this. {p(z)= \frac{1}{2}\int Q(z,y)dy + \frac{1}{2}\int Q(x,z)dz}. This is a good point to introduce two quantities which will be quite useful going forward.

    \displaystyle I_1(z)\equiv \int_{-\infty}^z Q(x,z) dx

    \displaystyle I_2(z)\equiv \int_z^\infty Q(z,y) dy

    In terms of these,

    \displaystyle p(z)= \frac{1}{2}[I_1(z)+I_2(z)]

    There’s nothing special about calling the variables {x} or {y} in the integrals and it is easy to see (since each only covers half the domain) that we get what we would expect:

    \displaystyle p(z)= \frac{1}{2}\int q(w,z)dw

    What we want is the distribution {p(z'|z)= p(z,z'|z)= p(z,z')/p(z)= q(z,z')/p(z)}. This gives us:

    \displaystyle p(z'|z)= \frac{q(z,z')}{\int q(w,z)dw}= \frac{q(z,z')}{I_1(z)+I_2(z)}

    Finally, this gives us the desired quantity {p(z'>z)= \int_{z'>z} dz' p(z'|z)}. It is easy to see that:

    \displaystyle p(z'<z)= \frac{I_1(z)}{I_1(z)+I_2(z)}

    \displaystyle p(z'>z)= \frac{I_2(z)}{I_1(z)+I_2(z)}

    As an example, consider the previous {(x,2x)} case — where one envelope holds twice what the other does. We observe {z}, and {z'} must be either {2z} or {z/2}, though we don’t know with what probabilities. If we are given the underlying distribution on {x}, say {P_2(x)}, we can figure that out. {Q(x,y)= P_2(x)\delta(y-2x)} and {q} is the symmetrized version. {\int q(w,z)dw= \int dw [Q(w,z)+Q(z,w)]= (P_2(z/2)+P_2(2z))}. So {p(z)= \frac{1}{2}(P_2(z/2)+P_2(2z))}. This is just what we’d expect — though we’re really dealing with discrete values and are being sloppy (which ends us up with a ratio of infinities from the {\delta} function when computing probability ratios, but we’ll ignore that here). The relevant probability ratio clearly is {P_2(z/2)/P_2(2z)}. From a purely probability standpoint, we should switch if {P_2(2z)>P_2(z/2)}. If we reimpose the algebraic structure and try to compute expectations (as in the previous problem) we would get an expected value of {z} from keeping and an expected value of {z[P_2(z/2)/2 + 2P(2z)]} from switching . Whether this is less than or greater than {z} depends on the distribution {P_2}.

    Returning to our analysis, let’s see how often we are right about switching if we know the actual distribution {Q} and use that knowledge directly. The strategy is obvious. Using our above formulae, we can compute {p(z'<z)} directly. To optimize our probability of winning, we observe {z} then we switch iff {I_1(z)<I_2(z)}. If there is additional algebraic structure and expectations can be defined, then an analogous calculations give whatever switching criterion maximizes the relevant expectation value.

    In terms of probabilities, full knowledge of {Q} is the best we can do. The probability we act correctly is:

    \displaystyle  \begin{array}{rcl}  P'(win)= \int dz \frac{[\theta(I_1(z)-I_2(z)) I_1(z) + \theta(I_2(z)-I_1(z))I_2(z)]}{I_1(z)+I_2(z)} \\ = \int dz \frac{\max(I_1(z),I_2(z))}{(I_1(z)+I_2(z)} \end{array}

    \displaystyle P'(win|z)= \frac{\max(I_1(z),I_2(z))}{(I_1(z)+I_2(z)}

    Since {I_1} and {I_2} are monotonic (one increasing, the other decreasing), we have a cutoff value {\hat z} (defined by {I_1({\hat z})= I_2({\hat z})}) below which we should switch and above which we should not.

    How do we do with our invented {P} instead? We could recast our earlier formula for {P(win)} into our current notation, but it’s easier to compute directly. For given {z}, the actual probability of needing to switch is {I_2(z)/(I_1(z)+I_2(z))}. Based on our algorithm, we will do so with probability {P(z<d)= 1-F(z)}. The probability of not needing to switch is {I_1(z)} and we do so with probability {P(z>d)= F(z)}. I.e., our probability of success for given {z} is:

    \displaystyle P(win|z)= \frac{I_1(z)F(z) + I_2(z)(1-F(z))}{I_1(z)+I_2(z)}

    For any given {z}, this is of the form {\alpha r + (1-\alpha)(1-r)} where {r= F(z)} and {\alpha= I_1(z)/(I_1(z)+I_2(z))}. The optimal solutions lie at one end or the other. So it obviously is best to have {F(z)=0} when {z<{\hat z}} and {F(z)=1} when {z>{\hat z}}. This would be discontinuous, but we could come up with a smoothed step function (ex. a logistic function) which is differentiable but arbitrarily sharp. The gist is that we want all the probability in {F} concentrated around {\hat z}. Unfortunately, we have no idea where {\hat z} is!

    Out of curiosity, what if we pick instead {P} to be the conditional distribution {p(z'|z)} itself once we’ve observed {z}? We’ll necessarily do worse than by direct comparison using {Q} (the max formula above), but how much worse? Well, {p(z'|z)= q(z,z')/(I_1(z)+I_2(z))}. Integrating over {z'<z} we have {F(z)= \int_{-\infty}^z p(z'|z) dz'= I_1(z)/(I_1(z),I_2(z))}. I.e., We end up with {(I_1^2(z)+I_2^2(z))/(I_1(z)+I_2(z))^2} as our probability of success. If we had used {1-p(z'|z)} for our {P} instead we would get {2I_1(z)I_2(z)/(I_1(z)+I_2(z))^2} instead. Neither is optimal in general.

    Next, let’s look at the problem from an information theory standpoint. As mentioned, there are two sources of entropy: (1) the choice of the underlying pair {(x,y)} (with {x<y} by definition) and (2) the selection {(z,z')=(x,y)} or {(z,z')=(y,x)} determined by our initial choice of an envelope. The latter is a fair coin toss with no information and maximum entropy. The information content of the former depends on the (true) underlying distribrution.

    Suppose we have perfect knowledge of the underlying distribution. Then any given {z} arises with probability {p(z)=\frac{1}{2}[I_1(z)+I_2(z)]}. Given that {z}, we have a Bernoulli random variable {p(z'>z)} given by {I_2(z)/(I_1(z)+I_2(z))}. The entropy of that specific coin toss (i.e. the conditional entropy of the Bernoulli distribution {p(z'> z|z)}) is

    \displaystyle H(z'>z|z)= \frac{-I_1(z)\ln I(z) - I_2(z)\ln I_2(z) + (I_1(z)+I_2(z))\ln [I_1(z)+I_2(z)]}{I_1(z)+I_2(z)}

    With our contrived distribution {P}, we are implicitly are operating as if {p(z'>z)= 1-F(z)}. This yields a conditional entropy:

    \displaystyle H'(z'>z|z)= -(1-F(z))\ln (1-F(z)) - F(z)\ln F(z)

    There is a natural measure of the information cost of assuming an incorrect distribution. It is the Kullback Liebler Divergence (also known as the relative entropy). While it wouldn’t make sense to compute it between {Q} and {P} (which are, among other things, of different dimension, we certainly can compare the cost for given {z} of the difference in our Bernoulli random variables for switching — and then integrate over {z} to get an average cost in bits. Let’s denote by {q(z'>z)} the probability based on the true distribution and keep {p(z'>z)} for the contrived one. I.e. {q(z'>z)= I_2(z)/(I_1(z)+I_2(z))} and {p(z'>z)= 1-F(z)}. For given {z}, the K-L divergence is:

    \displaystyle D(Q || P, z)= \frac{-I_2(z)\ln [(I_1(z)+I_2(z))(1-F(z))/I_2(z)] - I_1(z)\ln [(I_1(z)+I_2(z))F(z)/I_1(z)]}{I_1(z)+I_2(z)}

    Integrating this, we get the mean cost in bits of being wrong.

    \displaystyle  \begin{array}{rcl}  \langle D(Q || P) \rangle= \frac{1}{2}\int dz [-(I_1(z)+I_2(z))\ln [I_1(z)+I_2(z)] - I_2(z)\ln (1-F(z)) \\ -I_1(z)\ln F(z) + I_1(z)\ln I_1(z) + I_2(z)\ln I_2(z)] \end{array}

    The first term is simply {H(z)}, the entropy of our actual distribution over {z}. In fact, the first term and last 2 terms together we recognize as {\langle H(z'>z|z) \rangle}, the mean Bernoulli entropy of the actual distribution. In these terms, we have:

    \displaystyle \langle D(Q || P) \rangle= \langle H(z'>z|z) \rangle + \langle \frac{ -I_2(z)\ln(1-F(z)) - I_1(z)\ln F(z)}{I_1(z)+I_2(z)} \rangle

    where the expectations are over the unconditional actual distribution {p(z)}. The 2nd expectation on the right represents the cost of being wrong about {P}. If it was the optimal distribution with all probability centered near {\hat z} then the term on the right would approach {0} and there would be no entropy cost.

    As an aside, this sort of probabilistic strategy should not be confused with the mixed strategies of game theory. In our case, a mixed strategy would be an apriori choice {aK+(1-a)S} where {K} is the always-keep strategy, {S} is the always-switch strategy, and {0\le a\le 1} is the probability of employing the always-keep strategy. A player would flip a biased-coin with Bernoulli probability {a} and choose one of the two-strategies based on it. That has nothing to do with the measure-theory approach we’re taking here. In particular, a mixes strategy makes no use of the observed value {x} or its relation to the randomly sampled value. Any mixed strategy gives even-odds because the two underlying deterministic strategies both have even-odds.

    Ken Writes a Film

    Scroll down for the link to the movie, and to read my original script.

    A few months ago, I participated in a 72 hour film contest with some friends. It was a lot of fun, and we actually filmed in my condo — which was quite a blast. Aside from ducking out of the way whenever necessary, my role was to write the script.

    The basic premise was that we had to write a horror film in 72 hours with a certain prop, action, and theme. We were given these at 10 PM on the first night, which meant that I had to slam something out relatively quickly. One interesting aspect was that we didn’t actually know who would be available to act, or even how many. So the screenplay had to be easily adaptable. I drafted two ideas by 11:30ish and discussed them with Brian (the director, and a very talented author in his own right). We picked the more promising one, and honed the general idea. About 30 min later, I delivered to Brian the revised script and we decided to go with that.

    Below is a link to the film itself, now publicly available. This definitely was a learning experience, and I have to say the actors (David and Elena) were fantastic to work with. Given that they had so little time (filming had to be finished over a mere 30 hour period, from when they first were handed the script), what they accomplished was incredible. One interesting thing I learned was that phrases which read well on paper are not necessarily ones actors find easy to work with. Unusual turns of phrase are enjoyable in literature, but can be difficult to memorize — especially on short order. I imagine an experienced scriptwriter works closely with actors and has a strong sense of what will be executable and what won’t fly.

    The thing which surprised me most was post-production. We had a very talented post-production crew, but I had no idea what to expect. Again, there is a vast difference between what is plausible on paper (or seems easily filmed) and what is workable in post-production. As you can see, the final cut is quite different from the script.

    This gave me a more forgiving disposition toward Hollywood writers, and a clear understanding that the words (and scenes) set on paper may differ significantly from what audiences ultimately experience. From now on, I’ll be a bit more hesitant to blame screenwriters for the seemingly inane writing which plagues most Hollywood movies. It very well could be due to a confluence of factors which made it difficult or expensive to adhere to the script. Or maybe some idiot executive meddled, or they polled audience sentiment or some such nonsense. We didn’t have any of that, of course — just lots of talented people working performing their roles. So I think such divergences are inevitable. Sadly, no such excuse exists for novel writers.

    I still think having a single screenwriter is the best course, however. Having briefly participated in design by committee (or design by pseudo-autocratic democracy in this case), I think the alternative is far worse. Lots of post-its, a chaos of ideas, and most creativity lost in a homogenization driven by sheer exhaustion and a few strong personalities. Writing is best done by a single writer, with feedback at certain key points from the director. In the 2 hours spent “brainstorming,” a good writer could have pumped out 4 draft ideas, the director could have decided on one or two, and the writer could have finalized them. Too many chefs and all that. Then again, what do I know? If I knew what people actually wanted, I’d be rich.

    Without further ado, here is the final cut. Presumably it’s available somewhere on Amazon Prime but I couldn’t find the link, so I’m including the unofficial one a friend provided.

    Final cut of “A Teachable Moment”

    And here’s my original script (with Brian’s formatting reproduced as best I can given the blog limitations):

    A LIVING ROOM. A MAN DRESSED IN A PROFESSORIAL MANNER IS CONVERSING WITH A WOMAN WHO HAS THE FOCUSED LOOK OF A REPORTER OR INTERVIEWER.

    The whole thing is dialog, interspersed with small cuts to other scenes (no voiceovers). The cuts should be smooth and for a few seconds each. No sudden flashy stuff.

    W

    “I’ve been following your work for some time. The unique impact it has.”

    M

    [Smiles ingratiatingly]

    “I like to think so. Do you know what makes teaching so special? It’s a distillation of the noblest human activity: sharing.”

    [CUT TO VIEW OF WHERE THE RIVER GOES UNDER THE MUSEUM OF SCIENCE (THERE’S EVEN A SIGN WARNING KAYAKERS)].

    W

    “Some would take a more cynical view.”

    M

    [quietly regards her for a moment]

    “I’ll be honest. I’ve had lots of advantages.”

    [he laughs light-heartedly]

    “Not everybody has those advantages. Sure, I could feel guilty. But isn’t it better to use my strength for others?

    When you share…”

    [he tenses in poignancy].

    “…you can change a life.”

    [CUT TO VIEW OF ENTRANCE TO GRAFFITI COVERED TRAIN TUNNEL. A CARDBOARD BOX ON THE TRACK IS SHAKING SLIGHTLY.]

    W

    “I don’t think anyone would dispute this, but *how* you share matters too. Not everyone is ready to believe in pure motives.”.

    M

    [wry expression]

    “To most people sharing involves a trade: part of themselves for virtue, for the right to imagine themselves a better person. That’s foolish. Sharing is not a transaction. It can ennoble both giver and receiver. A teacher can give without losing.”

    [CUT TO SHOT OF THE SIDE DOOR ON THE INSIDE OF ONE OF THE TWO PEDESTRIAN TUNNELS UNDER THE ELLIOT ST BRIDGE. LOOKS LIKE A BARN DOOR BUT IN THE TUNNEL.]

    M

    “A lot of people don’t understand what teachers really do. I mean day in and day out, over and over.”

    W

    “I expect it can be quite difficult. Do you ever get tired?”

    M

    [pauses, and gives a cautious laugh]

    “I don’t have that luxury. That would be letting down the world in a sense.”

    [CUT TO SHOT OF MACHINE ROOM DOWN SIDE CORRIDOR (ROOM WITH BIG PIPES AND MACHINERY). PARTICULARLY LINGER ON THE HUGE HUMAN-SIZED PIPE.]

    W

    “That sounds a bit grandiose.”

    M

    [chuckles]

    “Yes, I suppose it would to someone not conversant with such matters.”

    W

    [chuckles]

    “You definitely sound like a teacher.”

    [looks at him slyly]

    “So teach me something.”

    [CUT TO SILHOUETTE OF FIGURE MECHANICALLY BLUDGEONING SOMETHING OR SOMEONE WITH TRUNCHEON BEHIND A SCREEN.].

    M

    [wags his finger and smiles]

    “I’ll have to charge you. My wisdom doesn’t come free.”

    W

    [grins and suggestively slides her chair right up to him. She’s now close to his face and her body quite close to his]

    “I’ll have to find some way to repay you.”

    [CUT TO SNOW PILE NEAR ALBANY ST. SOMETHING VAGUELY LIKE A PIECE OF CLOTHING IS STUCK IN IT.]

    M

    [clears his throat, clearly a bit flustered]

    “Very well. I’ll teach you something about teaching. The lessons conveyed through sounds we make are the tiniest fraction of how we teach. It is through subtler manipulations that we imprint our thoughts on the mechanism of this world.”

    [CUT TO TWO BURLAP SACKS AT THE BOTTOM OF SOME STAIRS, AND M HAS A SMILE “OH WHAT ARE THOSE KIDS UP TO THIS TIME” BEFORE DESCENDING TOWARD THEM. JUST BEFORE THE CAMERA CUTS OUT WE SEE A SLEDGE HAMMER IN HIS HAND.]

    W

    [whispering, sultry]:

    “Well, that’s quite a mouthful. I guess I owe you payment.”

    M

    [adjusts collar]:

    “N-no need.”

    W

    “But I insist. I’ll teach you a lesson as well.”

    [she lifts her jacket and flashes a badge.]

    M hesitates and seems like he’s about to lunge at her but she puts her hand to her hip and shakes her head, smiling in satisfaction..

    M slumps back, and W spreads photos of the various cut-scenes.

    M

    “You’re here for me, then?”

    W

    “In a sense.”

    [she smiles and puts her hand on his]

    “I’ve been looking for a good teacher.”

    Do’s and Don’ts for Modern Authors

    Every author has to post about the secrets to authorial success. Well, I’ve got a different take, a special take, a unique take. I HAVE no authorial success. Which means I’m more intimately familiar with what NOT to do. Who wants advice about how to succeed from somebody who HAS succeeded? That’s silly. Obviously they knew somebody, and they’re NOT going to give you that person’s phone number. But I have no such qualms. In fact, here are a few phone numbers which may belong to movers and shakers:

    • 555-1212
    • 000-0000
    • 90210
    • 314159265358979323846
    • 1

    The point is that when none of these are willing to give you the time of day, I will. 7:33 PM.

    So, without further ado, here is a list of helpful do’s and dont’s for aspiring authors:

    • Don’t … use big words or complex sentences. That makes you posh, elite, pretentious, and altogether hateful. Who reads big words and complex sentences these days? That’s old fashioned, like you know like last decade. Who wants to be OLD? Besides, why would you want your book to be inaccessible? Big words and complex sentences mean you will target a tiny number of people who mostly read things they’re told to read by the N.Y. Times and won’t like your stuff anyway unless you know somebody AT the N.Y. Times.

    • Don’t … employ subtle ideas or twists or anything complicated to grasp. Such books are for privileged old people, those educated in the dark era before people realized that the purpose of school was fashionable political activism. Just remember: ideas are bad. Most people don’t have any, and it’s rude to flaunt what you have and others don’t.

    • Don’t … proofread, spell-check, or worry about style or grammar. These are wasteful. Proofreading and editing take time. Lots of time. Nobody appreciates them, and they’ll just slow you down. All the best books were written on a phone using two thumbs and very few brain cells. How many artisanal craftsmen do you know? Exactly. If you’re not producing beer, it’s not a craft — it’s a waste of time. Just write as many words as you can as fast as you can. To borrow from the bible (Bumperstickers 3:21, 4): write them all and let god sort it out.

    • Don’t … use characters, plot, or dialog. Creativity is bad. You’ll only increase the chance of offending people. The best way to avoid doing that is by writing solely about yourself, but only if you’re not the type of person inherently offensive to others. There are some handy websites which list acceptable types of people and unacceptable ones.

    • Don’t … worry about pesky things like factual accuracy or consistency. A famous director said that when it’s a choice between drama and consistency, drama wins every time. He’s an idiot, but a rich one. What do you want to be: right or rich? Incidentally, it’s ALWAYS a choice between drama and consistency. If you have time to be consistent, spend it writing more drama instead. Your time is finite — which is a plothole that conveniently can be plugged by reversing the polarity of the Quantum Tachyonic Blockchain.

    • Don’t … advertise or pay anybody for anything. Why pay for nobody to buy your book, when you can get that for free?

    • Don’t … ask friends or family to review your book. Not because it’s against the rules, but because they won’t. Then you’ll have fewer friends and family. Only ask people you don’t like and who don’t like you.

    • Don’t … issue a press release. Nobody will read it, nobody will care. Yet another book tossed on the dung heap of human blather. Yawn. “News” must be something which matters to other people. Like journalists. As everyone knows, modern journalism involves complaining about something which happened to the reporter’s BFF, making it sound like a ubiquitous problem, and quoting lots of tweets. Serious journalists won’t have time for you because they always have a BFF in trouble, and curating tweets is a fulltime job.

    • Don’t … submit to agents, magazines, or contests. If you were the type of person who could get accepted, you would know because you would be published, famous, or well-connected. Since you’re not published, famous, or well-connected, you obviously won’t be accepted. Sure, every now and then somebody new accidentally slips in. It’s an accident resulting from their being related to somebody published, famous, or well-connected.

    • Do … copy whatever is popular at the moment. Book, movie, video-game, comic, or meme — it doesn’t matter. People only read what’s popular, otherwise something else would be popular. As a rich person once said: if you want to be rich do what rich people do. Which is giving bad advice to poor people. See? I’m going to be rich. Well, he actually never said you would be rich, just want to be. Look, people want to reread the same book over and over. It’s easier because they already know the words and nothing scary and unexpected can happen. So why not rewrite those very words and partake of the riches?

    • Do … focus on fanfiction. Being original is time-consuming, hard, and terribly unprofitable. Who wants to engage in some new unknown adventure when they can dwell in the comfortable world they’ve come to know. Not the real one; that’s terribly uncomfortable. But one inhabited by loveable characters they somehow feel a personal connection to, and who can’t get a restraining order against them.

    • Do … pretend to be somebody else. Nobody likes your sort. Whatever you are is offensive in all ways imagineable. Choose a name which represents the group favored by the publishing industry at this moment. Just look at who gets published and who doesn’t. Not established authors, but debut novelists. Nobody’s going to dump Stephen King just because the name Stephen is anathema according to the politics of that week. But they probably won’t publish debut novelist Stephen Timingsucks (unless TimingSucks is native American and native Americans are in that week).

    • Do … know somebody. It’s the only way to get an agent or publisher. If you don’t know anybody, then the best way to meet them is a cold approach. Go to buildings inhabited by agents and publishers, and ride the elevators. That’s why it’s called an “elevator pitch”. When somebody important-looking gets in, stand next to them, sideways, and stare at the side of their head. Remember: it doesn’t matter how the conversation gets started, just where it goes. Which isn’t always jail. All you need is one yes, and it really doesn’t matter how you get it.

    • Do … make it political. Your book should bravely embrace the prevailing political sentiments of the publishing industry. Only then will you be recognized for the courage of conformity. The publishing industry regularly offers awards for just that sort of thing.

    • Do … write about you, you, and you. Far more appealing to readers than plot, style, or substance is your commonplace personal struggle and how you specifically overcame it. Nothing is as compelling as minor adversity subjectively related by the one who experienced it. Be sure to make clear that the reason you prevailed was your unique grit, determination, and moral superiority. Like the dictators of old, you thrice refused the world’s entreaties to tell your story. Only when sufficiently importuned by the earnest pleas of the masses did you relent and accept the mantle of greatness.

    • Do … blog, tweet, instagram, post, and youtube. Who wants to read a book by and about somebody they don’t feel a personal connection with? Have you ever heard the names Tolstoy, Dickens, or Proust? Of course not. They didn’t understand the importance of selling the author, not the work. You need to sell yourself. Literally. While actual Roman-style slavery is illegal in most States, a variety of financial instruments can achieve the same affect.

    • Do … spend the vast majority of your time inhabiting an ecosystem of writers. Your time is far better spent blogging, connecting, and advising other writers rather than writing for the lay person. Sure, outreach is fashionable these days, and it does have a few benefits. But one should not spend too much time demonstrating the writing process through novels, stories, or poetry. Best to focus on publishing for one’s peers.

    • Do … workshop, workshop, and workshop. No writer of note ever succeeded without writing courses, workshops, several professional editors, and an emotional support network. How else could they learn to express themselves in precisely the right manner as discovered by modern researchers and taught only through MFA programs? This is why there’s nothing worth reading from before the 1990s. Fortunately we live in enlightened and egalitarian times, and the advantages of an MFA are available to everybody. Which explains why everybody has one.

    • Do … be chatty, shmoozy, and a massive extrovert who attends conferences, sucks up to agents, and shamelessly promotes yourself. If you’re not that way, make yourself that way. There are plenty of blogs and books by chatty, shmoozy, massive extroverts on how to. These explain in clear and practical terms how you should have been born chatty, shmoozy, and a massive extrovert. If that doesn’t work, there is a simple surgical procedure which can help. It’s called a lobotomy, and also will help you blog, tweet, post, and youtube more effectively. Be your audience.

    • Do … consider tried and true techniques when ordinary submission and marketing methods don’t work. These business methodologies have been refined and proven in many domains over many years. Whole enterprises are dedicated to their successful application, and they can be surprisingly inexpensive. Extortion, kidnapping, blackmail, torture, and politics all can work wonders for your book’s advancement. Pick your poison. Literally. I have an excellent book coming out, filled with recommendations and in which I describe my own struggle to find the right poison and the absolutely brilliant way I overcame this adversity. It’s a very compelling read.

    • Do … show, don’t tell. When somebody talks about the aforementioned tactics you’ve used, make a gruesome example of them. This is showing, so that people don’t tell. Most writing coaches emphasize the importance of “show don’t tell,” and you can find some excellent examples in the work of various drug cartels and the Heads of State of certain current allies and trading partners.

    • Do … kill your babies. This is another mainstay of writing wisdom, and a constant refrain in almost any workshop. It can be difficult, especially the first few times. But if that initial instinct can be overcome, it definitely is something worth trying. While it won’t always help, such sacrifices have been known to curry favor with XchXlotbltyl, the dark god of publication (and a major shareholder in most large publishing houses). Details on the appropriate ceremonies for different genres can be found on popular writing blogs. And don’t worry, you always can produce more babies… and thus more success.

    • Do … remember there’s no need to write the ending first — or ever. There has yet to be born a human with a different ending. But entropy and the inevitable degradation wrought by time rarely appeal to modern audiences. Best to throw in a sappy romantic hookup or hint at an improbable revival of the seemingly dead protagonist. Which brings us to…

    • Don’t … hint. Nobody likes ambiguity. That is why TV is so popular. Books are a very primitive technology, and they require a lot of unnecessary work by the reader. Faces, scenes, even actions need be imagined anew by every reader. This is inefficient. Remember, you’re catering to people who don’t have cable or can’t afford it or are allergic. It’s your job to make their entertainment as painless as possible despite their unfortunate circumstance. Anything else would be ablist. So don’t leave anything ambiguous. Make sure you spell out what just happened, over and over, just in case the first few explanantions didn’t work. Remember the first rule of teaching: Keep the kids’ Chromebook software up to date. Well, the 2nd rule: repeat everything 3 times for the people with no attention span, too stupid, or too distracted to have caught the first 2 times. And don’t forget to give them an achievement award for getting it. So repeat every plot point 3 times, and congratulate the reader on finally getting it.

    • Do … make the reader feel smarter than fictional characters. This is the point of revealing things to the reader that characters don’t know. A well written book will have the reader shouting advice to the characters. Because if your readers aren’t better than a nonexistent and contrived character, who are they better than?

    • Do … publish each sentence as you write it. In the old days, writers had to wait a long time. Agents vetted writers’ works, publishers vetted agents’ submissions, editors vetted accepted works, and copy-editors, proofreaders, and countless others meticulously checked things at every stage. That book of cat jokes filled replete with typos would take several years to see print, not counting the time required to hand-deliver manuscripts by stagecoach or the frequent loss of an editor or writer from dropsy. Thank goodness we live in modern times! These days there’s no need to wait years for feedback or abide by the traditional publication timeline. Your brilliance need not be thwarted by the need for reflection or editing. Each sentence you write should be tweeted, posted on Wattpad, and blogged the moment it appears. When you get feedback, incorporate it all. Otherwise somebody might be sad, and we don’t want anybody to be sad while reading your book. That’s for somebody else’s book, somebody poor and unsuccessful who uses big words and doesn’t know the rules. Besides, as Hollywood has shown, design by committee is the best way to create a quality creative product. Call it the democratization of writing. As recent polls showed, nothing’s better than democracy. In an ideal world, every word would be voted on and accepted or rejected accordingly. One day this utopia may be real, but for now you’ll have to settle for releasing on a sentence-by-sentence basis. At least you’ll have the satisfaction of knowing that your final product was vetted by countless strangers with wildly varying aptitudes, motives, and tastes, rather than a few so-called “professionals” who’ve been doing the same boring thing for years. Do you really want the same old boring people reading your work, let alone editing it?

    • Do … setup a botnet to counter the million of bad ratings your book will get on social media sites. In uncivilized times, negative reviews only came from critics who actually read your book but didn’t understand it or found it differed in some small way from what they thought you should have written but never would bother to write themselves because they’re too busy writing negative reviews. That was a slow process. We all know how long it took for Mozart to get meaningful feedback like “too many notes,” and how much his craft improved as a result. Imagine what he could have composed if he learned this earlier! These days we’re much more fortunate. One needn’t wait months or years for a hostile stranger with adverse incentives to read your book and pan it. There are millions of hostile strangers with adverse incentives willing to do so without troubling to read it. This is much more efficient, and we have modern social media to thank for rewarding such behavior with improved social standing. Otherwise, you’d have to wait for some “reputable” critic to actually read your literary novel and comment on it. Instead you’ll generously receive feedback from somebody far more credible who only reads young adult coming-of-age novels about pandas but is willing to step out of their comfort zone and negatively rate your book without having read it. You’re welcome.

    • Don’t … have any faith in humanity. If you did you won’t for long. But you didn’t or you wouldn’t be a writer in the first place. Who but from malice would wish to imprint their thoughts on the world. Or ask of another that they occupy the liminal time between nonexistence and nonexistence with a less poetic, less subtle, and less profound rehash of the same tired ideas. You are, after all, asking people to share your delusion of eloquence. That’s almost like founding a cult. Which incidentally, is an excellent way to promote your book.

    • Do .. buy my book. It won’t make you happy, but you can’t buy happiness so you may as well spend your money on this.

       

    Why NOT to use Amazon Ads for your book

    In today’s article, I ask a simple question: does it pay to advertise on amazon for your book? As can be guessed by the exceedingly astute from the title of the post, the answer is no. In addition to explaining how I came to this conclusion, I also will offer a brief review of the basics of Amazon’s online advertising.

    I’ll examine the matter purely in terms of tangible cost/benefit, and ignore considerations involving ease of use, time wasted dealing with Amazon’s bureaucracy, and the myriad other intangible costs involved.

    First let’s review some of the aspects of Indie publishing which relate to Amazon’s author ad campaigns, as well as how those campaigns work.

    Quick Review of Some Relevant Aspects of Indie Publishing

    Printer vs Distributor

    In general, to sell something on Amazon you need to be designated a “seller” and sign up for a seller account. This can be nontrivial, and at various times Amazon has made it well-nigh impossible to do so. Authors have a special in, however, but only if they publish a version of their book via Amazon. This means producing a Kindle edition and/or (more recently) printing through KDP (formerly Createspace).

    When an author publishes a print edition via some other service, one of two things happen, depending on the type of service. Either that service also offers distribution (Ingram Spark/Lightning Source) or it does not (everybody else). There are two major distributors: Ingram-Spark/Lightning-Source and Baker-Taylor. Of these, only Ingram offers Print-on-demand (POD) services to authors. All other POD services (with the exception of Amazon’s own Createspace, now part of KDP) only offer POD.

    An ordinary POD company sells books to the author/publisher who then may sell them to bookstores, individuals, etc. The author/publisher is responsible for storage, mailing, returns, invoice management, etc. Ingram, on the other hand, has a catalog that is available to all bookstores and automatically is pushed to them regularly. When you POD through Ingram, your book appears in their catalog — and thus quickly is available for order through almost every bookstore. This doesn’t mean it will appear on their shelves, but an individual who wishes to purchase a copy need only walk into a bookstore and ask to order one. In theory, the author/published need never handle a physical copy of the book!

    Why does this matter? It affects how you are treated by Amazon.

    Author vs Seller

    As far as Ingram is concerned, Amazon is just another bookstore. It too automatically slurps in your entry from Ingram’s catalog. Unlike a physical bookstore it offers it for sale just like it does any other book. A “book page” is created for it (and an “author page” may be created for the author), based on a cover image, blurb, etc, obtained from Ingram. Your book will appear and be treated like any other and show as being fulfilled by Amazon itself. Let’s refer to this as “Amazon proper”. Incidentally, Barnes and Noble will do exactly the same thing online.

    Amazon also hosts a seller marketplace (AMS), which includes, among many other vendors, lots of 3rd-party online bookstores. These each slurp in that same info and may offer your book for sale as well, often at a slight discount which they make up for through inflated shipping costs. It’s not uncommon for a new author to see their book appear for sale through myriad sellers immediately after launch and assume those are illicit review copies being resold. They’re not. These just are from mini-bookstore fronts which regurgitate the Ingram catalog. When someone orders from them, the order is relayed to Ingram which then fulfills it. Ditto for an order through Amazon Proper. It’s worth noting that Ingram has special shipping arrangements with Amazon, B&N, etc, and orders from these stores will be prioritized. While it may take 2-4 weeks for an order by the author/publisher themselves to be fulfilled, orders from Amazon or B&N are quickly dispatched.

    The information which appears on the Amazon page for a print book is obtained from the Ingram info. They do allow you to declare yourself an author and “claim” books, setup an author page, etc. Almost all authors put out a Kindle version of their book through KDP. In fact, most only do this. Amazon generally attaches this to any existing print page within a week or two. A few emails may be needed to make sure they associate the same author with both, etc, but generally it’s pretty smooth (as far as Amazon processes go).

    Independent of whether you are an author or publisher, you may setup a store-front on Amazon. Some publishers do this. In this case, you must register as a seller, setup tax info, etc. In theory, you could sell anything, not just your book. The seller can control the descriptions of products they sell, etc. But authors generally need not go to such lengths — as long as they are using KDP for at least one of their version.

    Why all this rigamarole? There is one area where it makes a big difference. Only sellers can run Amazon ad campaigns. If you only have a print edition which has been slurped in, you cannot run an ad campaign. You would have to create a seller store-front, sell the book through that, and then run a campaign as that seller and only for the things sold on that store-front. You couldn’t draw generic traffic to your book on Amazon proper.

    There is a trick, however. As mentioned, authors are viewed as an automatic type of seller — but only if they have a version of their book published through Amazon. If you’ve published a Kindle version of the book, then you qualify. In principle, the ads only would be for that version. But since Amazon links all versions of the book on a single page, de facto it is for all of them. No seller account is needed. This is how most author ad campaigns are run.

    On a practical note, Amazon used to distinguish author ad campaigns from others, offering tools which were more useful. Recently, they lumped them in with all other sellers, making practical management of ad campaigns much more challenging. Most sellers of any size use the API or 3rd party firms to manage their ad campaigns, but as a single author you will be forced to use Amazon’s own Really Awful Web Interface. Hmm… they should trademark that. Because it describes SO many of their web interfaces. But, that’s not what this article is about. Let’s assume it was the easiest to use interface in the world, a pleasure on par with the greatest of epicurean delights. Is it worth doing?

    Before answering that (well, we already answered it in the title, but before explaining that answer), let’s summarize the levels of complexity in managing sales/ads through Amazon:

    1. Easiest. Fulfillment via Amazon and can run ad campaign via Amazon as is:

    • Kindle edition, no POD
    • Kindle edition, POD via Amazon KDP
    • No Kindle edition, POD via Amazon KDP
    • Kindle edition, POD via Ingram

    2. Some effort. Fulfillment via Amazon but need a seller account to run an ad campaign

    • Kindle edition, POD via somebody other than Amazon or Ingram
    • No Kindle edition, POD via Ingram

    3. Messiest. Seller account needed to sell at all

    • No Kindle edition, POD via someone other than Amazon or Ingram

    Types of Ad Campaigns

    Next, let’s review the types of Amazon ad campaigns. There currently are three types. A given author may run many separate ad campaigns for the same book — but each will be of one and only one type.

    1. Sponsored Product Targeting: These ads are in the row of “sponsored products” which appears when you view the relevant product’s Amazon page. In principle you give Amazon a list of specific books, similar in theme or style or subject matter or whose readers are likely to be interested in your own. In practice, you have to be even more specific. You give Amazon a list of “products”, defined by ASINs. There may be many editions or versions of the same book. You’ve got to include ’em all. By hand. Without any helpful “select all” tool. And remember them. Because all you’ll see once your ad campaign is running is a breakdown by ASIN.

    2. Keyword Targeting: These ads appear in searches. There are 3 locations they may be placed: the top 2 spots, the middle 2 spots, or the last 2. Each page of results has ads in one or more of these locations, and they’re designated “sponsored”. Try a few searches, and you’ll see the placement. You give Amazon a list of keywords, generally two or more word phrases, and select how specific a match is required for each (exact, containing it, or broadly related). Then your ad will appear in the results when someone searches for those phrases on amazon. Keyword targeting allows negative keywords as well. For example, it may be a good idea to negate words such as “dvd”, “video”, “audio”, etc, especially if the most popular entries are in those categories. Search for your keyword, see what comes up, and negate any undesirable groups that appear toward the top (using -foo in your search). When you’ve negated the relevant keywords, the top entries should be precisely what you’d like to target.

    3. Category Targeting: You pick the Amazon categories that best suit your book — and presumably the book appears when somebody clicks the category. My experience is that category targeting is well-nigh useless for authors, and generates very few imprints or clicks. So we’ll ignore it.

    Ok, one more piece of review and then we’ll get to the analysis.

    How Amazon Ads Work

    Although their locations and types differ may differ, all ads are placed via the same process: an auction. In fact, pretty much any ad you see anywhere online has been chosen via an almost identical process.

    Every time a web page is served to a user (ex. you browse to a particular product), there are designated slots for ads. This is true of almost any webpage you view anywhere — all that differs is who is selling the ad space. Those slots are termed “impressions” (or more precisely, the placement of an ad in one is called an “impression”). Think of them as very short-lived billboards. To determine which ad is shown, an auction is conducted for each. This all is done very quickly behind the scenes. Well, not *so* quickly. Guess why webpages are so slow to load…

    Auction

    Because of its ubiquity, the auction process is fairly standard by this point. What I describe here holds for most major sites which sell advertising. The auction used by almost everybody is called a “second price auction”. In such an auction, the highest bid wins but only pays the 2nd highest bid. Mathematically, this can be shown to lead to certain desirable behaviors. Specifically, it is optimal for each participant to simply bid their maximum instead of trying to game things. This is important because Amazon will be given a maximum bid by you, and can only act as your proxy if it has a well-defined strategy for using it. Since it’s also acting as everyone else’s proxy, such a strategy must be a truthful one.

    [As an aside, what I described technically is called a Vickrey auction. Online services use a generalized version of this in which multiple slots are auctioned at once in order of quality. I.e., all the impressions on a page are auctioned simultaneously to the same bidders. The highest bidder gets the best impression, but pays the 2nd highest bid. The 2nd highest bidder gets the 2nd best impression but pays the 3rd highest bid, etc.]

    If you bid 1 and the 2nd highest bid is 0.10, you win and only pay 0.10. So, if you′re a lone risk−taker in a sea of timidity, it pays to bid high. You′ll always win, but you won′t pay much. However if there′s even one other participant with a similar strategy, you may end up paying quite a bit. If both of you bid high, one of you will win, and will pay a lot. For example, if you bid 1 and the other guy bids 0.99, you′ll pay 0.99.

    So far, we’ve discussed the second price auction in the abstract. It’s straightforward enough, even if the optimal strategy may require a little thought. The more interesting issue is what precisely you’re bidding on.

    In an ad auction, you are *not* bidding on the impression per se. Rather, you effectively are bidding on an option on the impression. Let me explain.

    Once every impression on the given web page has been auctioned, the winning ads are displayed. However, the winner of an impression only pays if the user clicks on their ad, regardless of what happens afterwards. To summarize:

    • Win impression, no click: Cost= 0
    • Win impression, click, sale: Cost= 2nd highest bid
    • Win impression, click, no sale: Cost= 2nd highest bid

    Amazon gets paid only if your ad is clicked on. If you win a million impression auctions and nobody clicks on your ad, you pay nothing. If every impression you win gets clicked on but nobody buys anything, you pay for all those impressions. In terms of what you pay Amazon, sales mean nothing, impressions mean nothing, only clicks count. But impressions are what you bid on. Financially, this tracks more closely the behavior of an option than a commodity.

    Terminology

    Obviously, the bid placement process is automated, so you’re not in direct control of the bidding in each auction. In essence, Amazon acts as your proxy in this regard. We’ll get to how your bids are placed shortly, but first let’s review some terminology.

    • Impression: We already encountered this. It is placement of an ad in a particular slot on a particular web page that is served. It is important to note that this refers to placement one time for one user. If the user refreshes the same page or another user visits it, a fresh auction is conducted.
    • Click-through-rate (CTR): The average fraction of impressions that get clicked on. The context determines precisely which CTR we’re talking about.
    • Conversion Rate: A “conversion” is an instance of the end goal being accomplished. In this case, that end goal is a sale (or order). The “Conversion Rate” is the average fraction of clicks that result in sales.
    • Conversions per Impression (CPI): The average fraction of impressions that result in sales. This is just the CTR * Conversion-Rate.
    • Order vs Sale: For most purposes these are the same. For products which may be bought in bulk, the two may differ (ex. 100 boxes of soap could be 1 order but 100 sales). But this rarely applies to books since customers generally buy only one.
    • Cost Per Click (CPC): The average cost of each click. Basically, the average 2nd highest bid in all auctions won by you and for which a click resulted.
    • Average Cost of Sales (ACOS): Each click may cost a different amount, so this measures the average actual cost of each sale, usually stated as a % of sale price. A 200% ACOS for a 10bookmeansthatitcosts20 of advertising on average to make one sale. The ACOS is the CPC/Conversion-Rate.

    Bid Placement

    I mentioned that an auction is conducted for each impression, and that it is done very quickly (in theory). If that’s the case, who are the bidders and how are the bids placed?

    The pool of potential bidders includes every active ad campaign which hasn’t run out of money that day. This pool is narrowed by the specified ad campaign criteria (product targets, keywords, negative keywords, category, etc). The result is a pool of bidders for the specific auction. In our case, these generally would be authors or publishers — but in principle could be anyone.

    Amazon acts as the proxy for all the participants. It determines which ad campaigns should participate in a given auction and it bids based on their instructions. Other than this, it has no discretion.

    As a bidder, you have control of the following (for a given ad campaign):

    • Campaign type: product, keyword, or category.
    • A list of products, keywords, negative keywords, and/or categories as appropriate for the campaign type.
    • For each keyword, product, or category, a maximum bid.
    • A “daily” budget. I’ll explain why this is in quotes shortly.
    • Ad text. You can’t control the image (it’s your book cover), but some text can be provided.

    Putting aside the campaign type and ad text itself, the salient point is that there is a list of “items” (keywords, products, or categories) which each have a maximum bid specified. There also is an overall daily budget.

    It turns out that the “daily” budget isn’t really “daily.” Amazon operates on a monthly cycle, and assigns a monthly budget based on the number of days and the daily budget. On any given day, the daily budget can be exceeeded, though generally not by some huge amount. If Amazon does exceed your monthly budget (which can happen) it will refund the difference. I’ve had this happen. The point is that you’re not really setting a daily budget but a rough guideline. It’s the associated monthly budget which is used.

    Once you exceed the budget constraint, that campaign is inactive until the next day (or month, depending on which budget has been exceeded). Obviously, that makes bidding relatively simple — there is none. So let’s assume the budget hasn’t been breached.

    For each auction, Amazon must determine whether any of the items in the campaign are a match. It then applies the specified maximum bid for that item. In principle. But nothing’s ever that simple, is it?

    Bid Adjustment

    By this point you may have noticed a major problem with the auction system as described. Let’s look at is from a transactional standpoint.

    You earn revenue through sales, but pay for clicks. The resource you have is money (your budget for advertising) and you need to trade it for sales revenue.

    Amazon earns revenue through clicks, but pays in impressions. What do I mean by this? The resource Amazon has is impressions, and they need to trade it for click revenue.

    Any scenario that results in lots of clicks per sale (or more precisely, a high ACOS) is detrimental to you. You wish to minimize ACOS. Otherwise, it will cost a lot of ad-money per sale, and that money presumably would have been better spent on other approaches.

    Similarly, any scenario which results in lots of impressions per click is detrimental to Amazon. If those impressions had been won by more effective sellers, then people would have clicked on them and Amazon would have been paid.

    As an extreme example, suppose Bob’s Offensive Overpriced Craporrium wins every auction on Amazon. Then Amazon will make no money from its ad business. On the other hand, if Sue’s Supertrendy Awesomorrium won, then through hypnosis, telepathy, and blackmail every single user would be compelled to click. This is great for Amazon.

    The problem is that you have control over your ad and, in broad strokes, the types of impressions you bid on. But what control does Amazon have? Other than heavy-handed tactics like throwing Bob off the platform, it would seem to have little means of preventing such losses. Obviously, this isn’t the case. Otherwise, how could Jeff Bezos afford a $35 Billion divorce? Amazon actually has 2 powerful tools. It is important to know about these, since you’ll probably perform like Bob when you first start advertising.

    First, Amazon has an algorithm which selects which impressions are a good match for you. Sometimes they can tune this based on performance. Amazon has no control when it comes to product-targeting. If you said: sign me up for auctions involving ASIN X, Amazon dutifully will do so. However, for other approaches such as keyword or category targeting, they have discretion and can play games. Bob quickly may find that he somehow isn’t a good fit for anything but books on bankruptcy.

    Second, Amazon can reduce your effective bid. In theory, they will bid your stated maximum for the item in question. However, they may throttle this based on performance. Even if your maximum is 3,youmayendupbidding2. It’s unclear whether this affects the amount you (or the other winner) pays upon winning (if a click results), but it probably does. Conducting an auction under other auspices would be very difficult. So, you may end up losing even if the 2nd highest bidder isn’t as high as your maximum.

    Ok, now that we have the background material out of the way, let’s get down to brass tacks. Or iron tacks. Brass is expensive.

    Why it doesn’t pay to advertise

    Now that we’ve reviewed the practice of advertising, let’s look at whether any of this is worth it. Specifically, what would it take to be profitable?

    Let us suppose that our book sells for G,ofwhichwekeepP. For example, a 10bookmayyield2.50 in net revenue for an author (where “net” means net of print costs, Amazon’s cut, etc, not net of advertising costs). In practice, things are a bit more complicated because there may be different P and G’s for the print and Kindle editions. For simplicity, let’s assume a single one for now.

    Before getting to the formal calculation, let’s look at a real example. Here are some numbers from an ad campaign I ran for my first book, “The Man Who Stands in Line.” I didn’t take it too seriously, because the book is not in a genre most people read. But I viewed the process as a good trial run before my novel (now out) “PACE

    Here are the stats. The campaign ran for a little over a year.

    • Impressions: 1,716,984
    • Total sales: $423.99
    • Total ad costs: $931.56
    • CPC: $0.48
    • CTR: 0.11%
    • Total Sales: 77
    • ACOS: 219.7%

    While my book didn’t break any records, it did furnish some useful data. Let’s look at these numbers more closely.

    On its surface, the ACOS doesn’t look too terrible. After all, I paid a little over twice the amount I made — right? Not quite. I paid a little over twice my gross revenue. The problem is that I only care about net revenue.

    As an extreme example, suppose I have two books A and B. Both yield me 2 net revenue per sale as the author, but A costs 1000 and B costs 4. Now suppose I have a pretty darned good ACOS of 50% on 1000 worth of sales. In both scenarios I’ve paid 500 in advertising costs. But in scenario A, I′ve made 2 net revenue. I.e., I have a net loss of 498. In scenario B, I′ve made 500 net revenue and have broken even.

    We immediately see 2 things:

    • The same ACOS can correspond to vastly different net revenues depending on the retail price of the book.
    • It’s really hard to advertise at a profit.

    Returning to my own book, the first problem in analyzing the numbers if that we can’t easily determine net revenue. There were two book formats. The book was available for 9.99 as a paperback (resulting in net revenue of around 2.50) and a 2.99 Kindle edition (net revenue about 2). Fortunately, the two net revenues per book are close. From the total sales, we can guess a net revenue between $150-200. That paints a much more dismal picture than the ACOS implies.

    Let’s next consider a more typical books, and figure out the numbers needed to make advertising profitable. Because the ratio of net to gross revenue per sale will be highest for the Kindle edition, let’s solely focus on that. Any print editions will have even worse ad costs.

    A CPC of 0.50 for books is fairly typical from what I′ve seen. Suppose you have a Kindle book priced at 4.99. With the 70% royalty rate (and no large file fees to speak of), you’d make a little under 3.50 per sale. But let′s be liberal. Let′s say your net profit is 4 per book.

    As mentioned, ACOS is deceptive. If you have an ACOS of 1, it looks like you’re breaking even. You’re not. It means your gross sales are breaking even. Your net revenue is negative. But it’s much much worse than that if you have a print book. Your net profit may be the same across formats but the gross revenue isn’t. The higher the price of the book and the lower the ratio of net to gross revenue per sale, the more unrepresentative the ACOS becomes.

    With the numbers we proposed, we must average 8 or fewer clicks per sale to remain in the black. Otherwise our net revenue for the sale is less than the advertising cost. That is a very optimistic number. Even the most precisely targeted advertising rarely sees such a rate. And that’s just to break even.

    Returning to my own book, what sort of ACOS would be required to break even? With the print edition, we would need a 25% ACOS. With the Kindle edition it would be closed to 66%. In my own case I would have required around a 5x lower ACOS than achieved. But that’s just to break even! Presumably we want to do better. The point of advertising isn’t just to break even. In essence, I would need an unattainable ACOS and Conversion rate for advertising to pay off.

    From these numbers, it’s clear that advertising on Amazon simply can’t pay off for Indie authors. From an economic standpoint it always will operate at a loss.

    But are there any other reasons to advertise?

    I’ve heard claims that the real purpose of such ads is exposure, that one nominal sale translates into many through word of mouth, etc, I’ve seen no evidence of this. It may happen, but the scale is very small.

    Another argument I’ve encountered is that impressions matter. Having lots of impressions may not translate into immediate sales but it raises awareness. The more times people see a book, the more “validated” it becomes in their mind. Presumably this translates into later sales which can’t be tracked as direct clicks. This is good for the author, since it means sales without any associated click-cost. Unfortunately, I’ve seen no evidence of this either. My real sales closely tracked the 77 listed; there weren’t all sorts of separate ones which didn’t appear as clicks or conversions. True, this wasn’t the world’s most marketable book. But if 1.7MM imprints make no difference then it’s too expensive to reach whatever number would.

    My sense is that one or both claims may be true for large, well-known publishers running huge campaigns, and where a friend’s recommendation of a recognizable title tips the scale. But that requires a critical mass and multi-faceted marketing strategy, and way more money than a typical indie author will care to invest.

    Like most services associated with indie publishing — agent readings at conferences, query review, marketing and publicity, books on marketing and publicity, etc — Amazon ads is just another piece of a machine designed to separate the naive from their money using the oldest of human failings: hope.

    So how should you sell books? If I knew, I’d spend my days basking in luxury and fending off rabid fans rather than writing snarky posts which nobody will read. But until that happens, I’ll keep you posted on the things I try. The simple answer may be the one you don’t want to hear: you don’t. You write if you have the inclination and means to do so, but you should have no expectation of being able to sell your book. If you wish to get people to read it, you may do so at a loss via Amazon ads. But there probably are much more effective ways to pay for readers.

    The Art of Writing Circa 2019 in 44 Easy Steps

    1. 1 minute: Come up with interesting observation or creative idea regarding a recent experience.

    2. 10 minutes: Compose concise, eloquent, and impactful written expression of said idea in 6 lines.

    3. 10 minutes: It’s too pompous. Remove 2 lines.

    4. 10 minutes: It’s too vertiginous. Remove 2 lines.

    5. 10 minutes: 2 lines is less pithy than one. Remove 1 line.

    6. 10 minutes: It isn’t accessible to a broad audience. Remove all words over 3 letters, adjectives, adverbs, and any verbs of latinate origin.

    7. 10 minutes: That one semicolon really should be a colon. People don’t like semicolons.

    8. 40 minutes: It could be misinterpreted by the far left, the far right, the Koala anti-defamation league, or Mothers Against Mothers. Reword it.

    9. 1 hour: Properly format the blog post. Italics? No, bold. No, italics. Maybe small-caps? That font really doesn’t look right.

    10. 4.8 hours: Research current trends on google. Add the same 15 long-tail keywords to the title, description, excerpt, post metadata, twitter metadata, facebook metadata, and google+ metadata. Realize google+ doesn’t exist anymore and feel sad, as if you put out an extra place setting for that one late cousin whose name nobody remembers.

    11. 6 hours: Locate a tangentially-related image with a suitable Creative Commons license. Realize the license doesn’t allow the modifications necessary to achieve an NC-17 rating. Find another image, this time with an open license on Wikimedia. Hope that nobody else had the brilliant idea to use a generic image of a college student with the word “Stock” overlaid on it.

    12. 2 hours: Remove face from image to avoid any potential liability.

    13. 2 hours: Thumbnail is different size than image on blog post is different size from instagram version is different size from flickr version. All involve different formats and much much smaller files than you have. Resize, reformat, and wish you weren’t using Windows.

    14. 1 hour: Pick an appropriate excerpt, hashtag, and alt-image text.

    15. 1 hour: Tweet, post, and instagram your idea as text, pseudo-text, image, and sentient pure-energy.

    16. 2 hours: Cross-post to all 14 of your other blogs, web-pages, and social-media accounts.

    16. 20 seconds: Realize that your long-tail keywords no longer are trending.

    17. 20 seconds: Receive 2000 angry tweets. Realize your hashtag already refers to a far-right hate group, a far-left hate group, a Beyonce Sci-Fi fanfiction group, the political campaign of the 237th least popular Democratic candidate for President, the Lower Mystic Valley Haskell, Knitting, and Dorodango group, or all of the above.

    18. 10.8 seconds: Beat Jack Dorsey’s own speed-record for deleting a tweet (which happened to be about Elon Musk tweeting about Donald Trump’s tweets).

    19. 6 hours: Update long-tail keywords to reflect current trends. Realize that Beyonce Sci-Fi fanfiction is trending, and leverage your newfound accidental affiliation to comment on the irony of your newfound accidental affiliation. Then tweet Beyonce to ask if she’ll retweet you.

    29. 5 seconds: Receive automated cease and desist order from Taylor Swift, who loans out her 2000 person legal team to Beyonce on the rare occasions it isn’t in use. Spot idling black limo full of tattooed lawyers outside window. One who looks suspiciously like Jennifer Pariser grins and gently drags her finger across her throat.

    30. 4.2 seconds: Beat own recent world record for deletion of a tweet.

    31. 28.6 minutes: Decide that social media is a waste of time. “Delete” all accounts.

    32. 28.6 minutes: Decide that you need a professional presence on social media after all, and won’t be intimidated by Taylor Swift or her 2000 lawyers. “Undelete” all your accounts.

    33. 1 minute: Decide original post is stupid, obsolete, and has several grammatical errors. Delete it.

    34. 2 hours: Delete all variants of post on blogs, web-pages, twitter, facebook, and instagram.

    35. 4 minutes: Just in case it’s really still brilliant, email idea to a friend.

    36. 4.8 hours; Worry whether [insert appropriate gender normative or non-normative pronoun] likes it.

    37. 1 minute: Try to interpret friend’s ambiguous single-emoticon reply.

    38. 30 minutes: Decide you’re not going to let the establishment dictate what’s art, and that the post’s stupidity, obsolescence, and several grammatical errors are intentional and signs of unappreciated genius.

    39. 12 minutes: Receive voicemail that you missed 2 consecutive shifts at Starbucks and are fired.

    40. 30 minutes: Decide you’re not going to be an indentured servant to the establishment and will go it alone like most great artists throughout history.

    41. 0.8 seconds: Realize you have no marketable skill, don’t know how to market a skill, and don’t even know what markets or skills are. Recall that most great artists throughout history had “Lord” before their name, got money from someone with “Lord” before their name, or died in penury. Consider writing a post about the injustice of this.

    42. 0.2 seconds: Have panic attack that you’ll end up homeless, penniless, and forced to use the public library for internet-access. Google whether euthanasia is legal, and how many Lattes it would take.

    43. 1 minute: Call manager at Starbucks, apologize profusely, and blame Taylor Swift for your absence. Hint that you have an “in” with her, and if the manager takes you back there may be sightings of Taylor Swift’s people idling in a black limo outside.

    44. 6.7 hours: A sadder and a wiser man, you rise the morrow morn. You decide to share your newfound sadness and wisdom with others. Go to step 1.

    Some Pet Peeves of a Grammar Snob

    Language evolves organically, and only a fool would expect the world to remain the same just to accommodate their own inability to move past the life knowledge they happened to acquire during their particular formative years.

    But I’m a fool and proud of it. Or more precisely, I’m selective in my folly. I choose to accept changes which arise organically in a sense which meets my arbitrary standards, but have nothing but disdain for those changes effected through the apparent illiteracy and incompetence of celebrities (also known as “influencers”). To me, it’s like corporate-speak but dumber. And that’s saying a lot.

    Put in simpler and less pompous terms for those of you who don’t understand big words: if some Hollywood moron screwed up and a bunch of jokers adopted the meme, that’s not “organic” growth of language — it’s a Hollywood moron screwing up and a bunch of jokers adopting the meme. None of these people should be allowed near the language, let alone given power to influence it. As far as I’m concerned, there should be a license required. And since you need a language license to take the written test in the first place, nobody could get one. But that’s ok. The language can’t change if nobody uses it.

    So, without further ado (well, there wasn’t really much ado so far, just a lot of whining), here are a few of my favorite things (sung to the dulcet strains of an NWA song):

    1. Same Difference: A difference requires two objects for comparison. To be the same, two differences involve at least 3 objects (and possibly 4) and two comparisons. For example: I’m pedantic and pompous. Same thing (well, not really, but we’ll allow it). I’m pedantic and pompous, and he’s pretentious and self-important. Same difference (well, not really, but a sight better than before). Same thing: 2 items, 1 comparison. Same difference: 3-4 items, 2 comparisons.

    2. Pay the consequences: You pay a penalty or a price. You suffer consequences. I hope that the idiot who birthed this does all three.

    3. Associated to: This one requires a delicate touch. It’s a mistake by my favorite people: mathematicians. And they have oh-so-fragile egos. Sadly, I can’t blame the arch-media-corporate hegemony which secretly controls our brains through alien ultra-quantum-fractal-catchwords. Not that I would anyway. I’m not sure where “associated to” started, but I have an irresistible urge to jump up and scream whenever somebody says it. And since most math articles, books, and even wikipedia articles seem to have adopted it, I basically spend all day standing up and screaming. Which is no different than before, but now I have a plausible explanation when cops, social workers, and concerned-looking parents inquire. I thought of writing an automatic script to change every occurrence in wikipedia, but decided I was too lazy. Besides, every article has a little gatekeeper associated to it who guards it and tends it and flames anybody who tries to change anything. I did read a possible explanation for the phenomenon, however (the “associated to”, not the little folk guarding wikipedia pages). In latinate languages such as Italian, “associare” takes “a” as its preposition, which naively translates to “to” in English. I suspect this is indeed the source, not because I have any knowledge beyond what I read but because of what it would mean if it weren’t true. The only other plausible explanation is that Gonklaxu the Dissatisfier has penetrated the barrier to our galaxy and is sowing discord amongst the mathematicians who pose the greatest threat to his 12-dimensional nonorientable being. Since mathematicians apparently don’t read anything but math books, that strategy would be singularly successful. The thought of Gonklaxu does keep me awake at night, I’ll admit. Because if he is invading, it means he didn’t stop emailing because he was banished to a nonmeasurable corner of the duoverse. Rejection hurts so much. I associate it to the pain of hearing associate to.

    I’m sure I’ll think of a few more soon, so stay tuned!

    Semidirect Products, Split Exact Sequences, and all that

    One of the things I’ve butted heads with in studying Lie Groups is the semidirect product and its relationship to split exact sequences. It quickly became apparent that this was a pretty sizeable hole in my basic knowledge, so I decided to clarify this stuff once and for all.

    — Normal Subgroups and Quotient Groups —

    First, a brief refresher on Normal subgroups and Quotient groups. We are given a group {G} and subgroup {H\subseteq G}.

    • Left cosets are written {gH} and right cosets are written {Hg}. Each is a set of elements in {G}. Not all left cosets are distinct, but any two are either equal or disjoint. Ditto for right cosets.
    • The left (right) cosets form a partition of {G}, but they do not in general form a group. We can try to imbue them with a suitable product, but there are obstructions to the group axioms. For example {g^{-1}H} is not a useful inverse since {(gh)^{-1}= h^{-1}g^{-1}}, so neither left cosets nor right cosets multiply as desired. More generally {(gg')H} does not consist of a product of an element of {gH} and an element of {g'H}.
    • We define the Quotient Set {G/H} to be the set of left cosets. As mentioned, it is not a group in general. There is an equivalent definition for right cosets, written {H\setminus{}G}, but it doesn’t appear often. In most cases we care about the two are the same.
    • It is easy to see that the condition which removes the obstruction is that {gH=Hg} for all {g}. Equivalently, {gHg^{-1}=H} for all {g}. If this holds, the cosets form a group. Often the stated condition is that the sets of left and right cosets are the same. But {g\in gH,Hg} so this is the same exact condition.
    • {H} is a Normal Subgroup if it obeys the conditions which make the cosets into a group.
    • Usually a Normal Subgroup is denoted {N}, and we write {N\triangleleft G} (or {N\trianglelefteq G}).
    • For a Normal subgroup {N}, the Quotient Set {Q=G/N} has (by definition) the natural structure of a group. It is called the Quotient Group.
    • We have two natural maps associated with a Normal Subgroup:
      • {N\xrightarrow{i} G} is an inclusion (i.e. injective), defined by {h\rightarrow h} (where the righthand {h} is viewed in {G}). This is a homomorphism defined for any subgroup, not just normal ones
      • {G\xrightarrow{q} Q} is the quotient map (surjective), defined by {g\rightarrow gN} (with the righthand viewed as a coset, i.e. an element of {G/N}). This map is defined for any subgroup, with {Q} the Quotient Set. For Normal Subgroups, it is a group homomorphism.
    • We know there is a copy of {N} in {G}. Though {Q} is derived from {G} and {N}, and possesses no new info, there may or may not be a copy of it in {G}. Two natural questions are when that is the case, and how {G}, {N}, and {Q} are related in general.

    Let’s also recall the First Isomorphism Theorem for groups. Given any two groups {G} and {H} and a homomorphism {\phi:G\rightarrow H}, the following hold:

    • {\ker \phi} is a Normal Subgroup of {G}
    • {\mathop{\text{im}} \phi} is a subgroup of {H}
    • {\mathop{\text{im}} \phi} is isomorphic to the Quotient Group {G/\ker\phi}.

    Again, we have to ask: since {\ker\phi} is a Normal Subgroup of {G}, and {\mathop{\text{im}}\phi} is isomorphic to the Quotient Group {G/\ker\phi} which “sort of” may have an image in {G}, is it meaningful to write something like (playing fast and loose with notation) {G\stackrel{?}{=} \ker\phi \oplus \mathop{\text{im}} \phi} (being very loose with notation)? The answer is no, it’s more complicated.

    — Exact Sequences —

    Next, a very brief review of exact sequences. We’ll use {1} for the trivial group. The usual convention is to use {1} for general groups and {0} for Abelian groups. An exact sequence is a sequence of homomorphisms between groups {\cdots \rightarrow G_n \xrightarrow{f_n} G_{n-1}\xrightarrow{f_{n-1}} \cdots} where {\mathop{\text{im}} f_n= \ker f_{n-1}} for every pair. Here are some basic properties:

    • {1\rightarrow A \xrightarrow{f} B\cdots} means that {f} is injective.
    • {\cdots A\xrightarrow{f} B\rightarrow 1} means that {f} is surjective.
    • {1\rightarrow A\rightarrow B\rightarrow 1} means {A=B}.
    • Short Exact Sequence (SES): This is defined as an exact sequence of the form: {1\rightarrow A\xrightarrow{f} B\xrightarrow{g} C\rightarrow 1}.
    • For an SES, {f} is injective, {g} is surjective, and {C=B/\mathop{\text{im}} f}
    • SES’s arise all the time when dealing with groups, and the critical question is whether they “split”.

    We’re now ready to define Split SES’s.

    • Right Split SES: There exists a homomorphism {h:C\rightarrow B} such that {g\circ h=Id_C}. Basically, we can move to {B} and back from {C} without losing info — which means {C} is in some sense a subgroup of {B}.
    • Left Split SES: There exists a homomorphism {h:B\rightarrow A} such that {h\circ g=Id_A}. Basically, we can move to {B} and back from {A} without losing info — which means {A} is in some sense a subgroup of {B}.
    • These two conditions are not in general equal, or even equivalently restrictive. The Left Split condition is far more constraining than the Right Split one in general. The direction of the homomorphisms in the SES introduce an asymmetry. [My note: it seems likely that the two are dual in some sense.]

    — External vs Internal View —

    We’re going to described 3 types of group operations: the direct product, semi-direct product, and group extension. Each has a particular relationship to Normality and SES’s. There are two equivalent ways to approach this, depending whether we prefer to define a binary operation between two distinct groups or to consider the relationship amongst subgroups of a given group.

    • External view: We define a binary operation on two distinct, unrelated groups. Two groups go in, and another group comes out.
    • Internal view: We define a relationship between a group and various groups derived from it (ex. Normal or Quotient).
    • These approaches are equivalent. The Internal view describes the relationship amongst the two groups involved in the External view and their issue. Conversely, the derived groups in the Internal view may be recombined via the External view operation.

    We must be a little careful with notation and terminology. When we use the symbol {HK}, it can mean one of two things.

    • Case 1: {H} and {K} are distinct groups. {HK} is just the set of all pairs of elements {(h,k)}. I.e. it is the direct product set (but not group).
    • Case 2: {H} and {K} are subgroups of a common group {G} (or have some natural implicit isomorphisms to such subgroups). In this case, {HK} is the set of all elements in {G} obtained as a product of an element of {H} and an element of {K} under the group multiplication.
    • Note that we may prefer cases where two subgroups cover {G}, but there are plenty of other possibilities. For example, consider {Z_{30}} (the integers mod 30). This has several obvious subgroups ({Z_2}, {Z_3}, {Z_5}, {Z_6}, {Z_{10}}, {Z_{15}}). {Z_2} and {Z_3} only intersect on {0} (the additive identity). However, the two do not cover (or even generate) the group! Similarly, {Z_2} and {Z_{10}} do not cover the group (or even generate it) but intersect on a nontrivial subset!
    • Going the other way, we’ll say that {G=HK} if {H} and {K} are subgroups and every element {g} can be written as {hk} for some {h\in H} and {k\in K}. Note that {H} and {K} need not be disjoint (or even cover {G} set-wise).

    Another potentially confusing point should be touched on. When we speak of “disjoint” subgroups {H} and {K} we mean that {H\cap K=\{e\}}, NOT that it is the null set. I.e., {H\cap K= 1}, the trivial group.

    — Semidirect Product —

    The semidirect product may seem a bit arbitrary at first but, as we will see, it is a natural part of a progression which begins with the Direct Product. Here are the two ways of defining it.

    • External view (aka Outer Semidirect Product): Given two groups {H} and {K} and a map {\phi:K\rightarrow Aut(H)}, we define a new group {H\rtimes K}. We’ll denote by {\phi_k(h)} the effect of the automorphism {\phi(k)} on {h} (and thus an element of {H}). Set-wise, {H\rtimes K} is just {H\times K} (i.e. all pairs {(h,k)}). The identity is {(e,e)}. Multiplication on {H\rtimes K} is defined as {(h,k)(h',k')= (h\phi_k(h'),kk')}. The inverse is {(h,k)^{-1}= (\phi_{k^{-1}}(h^{-1}),k^{-1})}.
    • Internal view (aka Inner Semidirect Product): Given a group {G} and two disjoint subgroups {N} and {K}, such that {G=NK} and {N} is a Normal Subgroup, {G} is called the Semidirect product {N\rtimes K}. The normality of {H} constrains {K} to be isomorphic to the Quotient Group {G/N}.

    There are a few important things to note about this.

    • There are (potentially) many Semidirect products of two given groups, obtained via different choices of {\phi}. The notation is deceptive because it hides our choice of {\phi}. Given any {H,K,\phi} there exists a Semidirect product {H\rtimes K}. The various Semidirect products may be isomorphic to one another, but in general need not be. I.e., a given {H} and {K} may have multiple distinct semidirect products. This actually happens. Wikipedia mentions that there are 4 non-isomorphic semidirect products of {C_8} and {C_2} (the former being the Normal Subgroup in each case). One is a Direct Product, and the other 3 are not.
    • It also is possible for a given group {G} to arise from several distinct Semidirect products (of different pairs of groups). Again from Wikipedia, there is a group of order 24 which can be written as 4 distinct semiproducts of groups.
    • Yet another oddity is that a seemingly nontrivial {H\rtimes K} can be isomorphic to {H\oplus K}.
    • If {\phi= Id} (i.e. every {k} maps to the identify map on {H}), then {G=H\oplus K}.
    • To go from the External view to the Internal one, we note that, by construction, {H} is a Normal Subgroup of {H\rtimes K} and {K} is the Quotient Group {G/H}. To be precise, the Normal Subgroup is {(N,e)}, which is isomorphic to {N}, and the Quotient Group {G/(N,e)} is isomorphic to {K}.
    • To go from the Internal view to the External one, we choose {\phi_k(h)= khk^{-1}} as our function. I.e., {\phi} is just conjugation by the relevant element.
    • It may seem like there is an imbalance here. For a specific choice of Normal Subgroup {N}, the External view offers complete freedom of {\phi}, while the Internal view has a fixed {\phi}. Surely the latter is a special case of the former. The fallacy in this is that we must consider the pair {(G,N)}. We very well could have non-isomorphic {G,G'} with Normal Subgroups {N,N'} where {N\approx N'}. I.e. they are the same Normal Subgroup, but with different parent groups. We then would have different {\phi}‘s via our Internal view procedure. The correspondence is between {(H,K,\phi)} and {(G,N,K)} choices. Put differently, the freedom in {\phi} loosely corresponds to a freedom in {G}.
    • Note that, given {G} and a Normal Subgroup {N} — with the automatic Quotient Group {G/N} — we do NOT necessarily have a Semidirect product relationship. The condition of the Semidirect product is stricter than this. As we will see it requires not just isomorphism, but a specific isomorphism, between {H} and {G/N}. Equivalently, it requires a Right-Split SES (as we will discuss).
    • The multiplication defined in the External view may seem very strange and unintuitive. In essence, here is what’s happening. For a direct product, {H} and {K} are independent of one another. Each half of the pair acts only on its own elements. For a semidirect product, the non-normal half {K} can twist the normal half {H}. Each element of {K} can alter {H} in some prescribed fashion, embodied in {\phi(k)}. So {K} is unaffected by {H} but {H} can be twisted by {K}.
    • It is interesting to compare the basic idea to that of a Fiber bundle. There, the fiber can twist (via a group of homeomorphisms) as we move around the base space. Here, the normal subgroup can twist as we move around the non-normal part. Each generalizes a direct product and measures our need to depart from it.
    • The semidirect product of two groups is Abelian iff it’s just a direct product of abelian groups.

    — Group Extensions —

    As with Semidirect products, there are 2 ways to view these. To make matters confusing, the notation speaks to an Internal view, while the term “extension” speaks to an External view.

    • External view: Given groups {A} and {C}, we say that {B} is an extension of {C} by {A} if there is a SES {1\rightarrow A\rightarrow B\rightarrow C\rightarrow 1}.
    • Internal view: Given a group {G} and Normal Subgroup {N\triangleleft G}, we say that {G} is an extension of {Q} by {N}, where {Q=G/N} is the Quotient Group.
    • Note that the two are equivalent. If {B} is an extension of {A} by {C}, then {A} is Normal in {B} and {C} is isomorphic to the Quotient Group {B/A}.
    • Put simply, the most general form of the Group, Normal Subgroup, induced Quotient Group trio is the Group Extension.

    — Direct Products, Semidirect Products, and Group Extensions —

    In the External view, we’ve mentioned three means of getting a group {B} from two groups {A} and {C}:

    • Direct Product: {B=A\oplus C}. This is unique.
    • Semidirect Product: {B=A\rtimes C}. There may multiple of these, corresponding to different {\phi}‘s.
    • Group Extension: A group {B} for which there are 2 homomorphisms forming a SES {1\rightarrow A\rightarrow B\rightarrow C\rightarrow 1}. There may be many of these, corresponding to different choices of the two homomorphisms.

    Equivalently, we have several ways of describing the relationship between two subgroups {H,K\subseteq G} which are disjoint (i.e. {H\cap K=\{e\}}).

    • Direct Product: {G=H\oplus K} requires that both be Normal Subgroups.
    • Semidirect Product: {G=H\rtimes K} requires that {H} be normal (in which case, {Q=G/H}, and {\phi} is determined by it). For a given {H} there may be multiple, corresponding to different {G}‘s.
    • Group Extension: Both {H} and {K} sit in {G} to some extent. {H} must be Normal.

    Note that not every possible relationship amongst groups is captured by these. For example, we could have two non-normal subgroups or two homomorphisms which don’t form an SES, or no relationship at all.

    An excellent hierarchy of conditions was provided by Arturo Magidin in answer to someone’s question on Stackoverflow. I roughly replicate it here. Unlike him, I’ll be sloppy and not distinguish between subgroups and groups isomorphic to subgroups.

    • Direct Product ({G=H\oplus K}): {H,K} both Normal Subgroups. {H,K} disjoint. {G=HK}
    • Semidirect Products ({G=H\rtimes K}): {H} Normal Subgroup, {K} Subgroup. {H,K} disjoint. {G=HK}. I.e., we lose Normality of {K}.
    • Group Extension ({G} is extension of {H} by {K}): {H} Normal Subgroup, {G/H\approx K}. I.e. {K} remains the Quotient Group (as before), but the Quotient Group may no longer be a subgroup of {G} at all!

    Now is a good time to mention the relationship between the various SES Splitting conditions:

    • For all groups: Left Split is equivalent to {B=A\oplus C}, and they imply Right Split. (LS=DP) => RS always.
    • For abelian groups, the converse holds and Right split implies Left Split and Direct Sum. I.e. the conditions are equivalent. LS=DP=RS for Abelian.
    • For nonabelian groups: Right Split implies {B=A\rtimes C} (with {\phi} depending on the SES map). We’ll discuss this shortly.

    Back to the hierarchy, now from a SES standpoint:

    • Most general case: There is no SES at all. Given groups {A,B,C}, there may be no homomorphisms between them. If there are homomorphisms, there may be none which form an SES. Consider a general pair of homomorphisms {f:A\rightarrow B} and {g:B\rightarrow C}, with no assumptions. We may turn to the first isomorphism theorem for help, but that does us no good. The first isomorphism theorem says that {\ker f \triangleleft B} and {\mathop{\text{im}} f\approx A/\ker f}, and {\ker g \triangleleft C} and {\mathop{\text{im}} g\approx B/\ker g}. This places no constraints on {A} or {C}.
    • Group Extension: Any SES defines a group extension. They are the same thing.
    • Semidirect Product: Any SES which right-splits corresponds to a Semidirect Product (with the right-split map determining {\phi})
    • Direct Product: Any SES which left-splits (and thus right-splits too) corresponds to a direct product.

    So, when we see the standard SES: {1\rightarrow N\rightarrow G\rightarrow G/N\rightarrow 1}, this is a group extension. Only if it right splits can we write {G= N\rtimes G/N}, and only if it left splits can we write {G= N\oplus G/N}.

    — Some Notes —

    • Group Extensions are said to be equivalent if their {B}‘s are isomorphic and there exists an isomorphism between them which makes a diamond diagram commute. It is perfectly possible for the {B}‘s to be isomorphic but for two SES’s not to be equivalent extensions.
    • Subtlety referred to above. A quotient group need not be isomorphic to a subgroup of {G}. It only is defined when {N} is normal, and there automatically is a surjective homomorphism {G\rightarrow Q}. But we don’t have an injective homomorphism {Q\rightarrow G}, which is what would be need for it to be isomorphic to a subgroup of {G}. This is precisely what the right-split furnishes. In that case, it is indeed a subgroup of {G}. The semidirect product may be thought of as the statement that {Q} is a subgroup of {G}.
    • In the definition of right split and left split, the crucial aspect of the “inverse” maps is that they be homomorphisms. A simple injective (for right-split, or surjective for left-split) map is not enough!
    • It is sometimes said that the concept of subgroup is dual to the concept of quotient group. This is intuitive in the following sense. A subgroup can be thought of as an injective homomorphism. By the SES for normal/quotient groups, we can think of a quotient group as a surjective homomorphism. Since injections and surjections are categorically dual, it makes sense to think of quotient groups and subgroups as similarly dual. Whether the more useful duality is subgroup quotient group or normal subgroup quotient group is unclear to me.

    180 Women and Sun Tzu

    It is related that Sun Tzu (the elder) of Ch’i was granted an audience with Ho Lu, the King Of Wu, after writing for him a modest monograph which later came to be known as The Art of War. A mere scholar until then (or as much of a theorist as one could be in those volatile times), Sun Tzu clearly aspired to military command.

    During the interview, Ho Lu asked whether he could put into practice the military principles he expounded — but using women. Sun Tzu agreed to the test, and 180 palace ladies were summoned. These were divided by him into two companies, with one of the King’s favorite concubines given the command of each.

    Sun Tzu ordered his new army to perform a right turn in unison, but was met with a chorus of giggles. He then explained that, “If words of command are not clear and distinct, if orders are not thoroughly understood, then the general is to blame.” He repeated the order, now with a left turn, and the result was the same. He now announced that, “If words of command are not clear and distinct, if orders are not thoroughly understood, then the general is to blame. But if his orders are clear, and the soldiers nevertheless disobey, then it is the fault of their officers,” and promptly ordered the two concubines beheaded.

    At this point, Ho Lu intervened and sent down an order to spare the concubines for he would be bereft by their deaths. Sun Tzu replied that, “Having once received His Majesty’s commission to be the general of his forces, there are certain commands of His Majesty which, acting in that capacity, I am unable to accept.” He went ahead and beheaded the two women, promoting others to fill their commands. Subsequent orders were obeyed instantly and silently by the army of women.

    Ho Lu was despondent and showed no further interest in the proceedings, for which Sun Tzu rebuked him as a man of words and not deeds. Later he was commissioned a real general by Ho Lu, proceeded to embark on a brilliant campaign of conquest, and is described as eventually “sharing in the might of the king.”

    This is a particularly bewildering if unpleasant episode. Putting aside any impression the story may make on modern sensibilities, there are some glaring incongruities. What makes it more indecipherable still is that this is the only reputable tale of Sun Tzu the elder. Apart from this and the words in his book, we know nothing of the man, and therefore cannot place the event in any meaningful context. Let us suppose the specifics of the story are true, and leave speculation on that account to historians. The episode itself raises some very interesting questions about both Sun Tzu and Ho Lu.

    It is clear that Sun Tzu knew he would have to execute the King’s two favorite concubines. The only question is whether he knew this before he set out for the interview or only when he acceded to the King’s request. Though according to the tale it was Ho Lu who proposed a drill with the palace women, Sun Tzu must have understood he would have to kill not just two women but these specific women.

    Let’s address the broader case first. It was not only natural but inevitable that court ladies would respond to such a summons in precisely the manner they did. Even if we ignore the security they certainly felt in their rank and the affections of the King, the culture demanded it. Earnest participation in such a drill would be deemed unladylike. It would be unfair to think the court ladies silly or foolish. It is reasonable to assume that in their own domain of activity they exhibited the same range of competence and expertise as men did in martial affairs. But their lives were governed by ceremony, and many behaviours were proscribed. There could be no doubt they would view the proceedings as a game and nothing more. Even if they wished to, they could not engage in a serious military drill and behave like men without inviting quiet censure. The penetrating Sun Tzu could not have been unaware of this.

    Thus he knew that the commanders would be executed. He may not have entered the King’s presence expecting to kill innocent women, but he clearly was prepared to do so once Ho Lu made his proposal. In fact, Sun Tzu had little choice at that point. Even if the King’s proposal was intended in jest, he still would be judged by the result. Any appearance of frivolity belied the critical proof demanded of him. Sun Tzu’s own fate was in the balance. He would not have been killed, but he likely would have been dismissed, disgraced, and his ambitions irredeemably undermined.

    Though the story makes the proposal sound like the whimsical fancy of a King, it very well could have been a considered attempt to dismiss a noisome applicant. Simply refusing an audience could have been impolitic. The man’s connections or family rank may have demanded suitable consideration, or perhaps the king wished to maintain the appearance of munificence. Either way, it is plausible that he deliberately set Sun Tzu an impossible task to be rid of him without the drawbacks of a refusal. The King may not have known what manner of man he dealt with, simply assuming he would be deterred once he encountered the palace ladies.

    Or he may have intended it as a true test. One of the central themes of Chinese literature is that the monarch’s will is inviolable. Injustice or folly arises not from a failing in the King but from venal advisers who hide the truth and misguide him. A dutiful subject seeks not to censure or overthrow, but rather remove the putrescence which clouds the King’s judgment with falsehood, and install wise and virtuous advisers. Put simply, the nature of royalty is virtuous but it is bound by the veil of mortality, and thus can be deceived. One consequence of this is that disobedience is a sin, even in service of justice. Any command must be obeyed, however impossible. This is no different from Greek mythology and its treatment of the gods. There, the impossible tasks often only could be accomplished with magical assistance. In Sun Tzu’s case, no magic was needed. Only the will to murder two great ladies.

    As for the choice of women to execute, it does not matter whether the King or Sun Tzu chose the disposition of troops and commands. The moment Sun Tzu agreed to the proposal, he knew not only that he would have to execute women but which ones. Since he chose, this decision was made directly. But even if it had been left to the king, there could be no question who would be placed in command and thus executed.

    The palace hierarchy was very strict. While the ladies probably weren’t the violent rivals oft depicted in fiction, proximity to the King — or, more precisely, place in his affections, particularly as secured by production of a potential heir — lent rank. No doubt there also was a system of seniority based on age and family, among the women, many of whom probably were neither concubines nor courtesans, but noble-women whose husbands served the King. It was common for ladies to advance their husbands’ (and their own) fortunes through friendship with the King’s concubines. Whatever the precise composition of the group, a strict pecking order existed. At the top of this order were the King’s favorites. There could be no other choice consistent with domestic accord and the rules of precedence. Those two favorite concubines were the only possible commanders of the companies.

    To make matters worse, those concubines may already have produced heirs. Possibly they were with child at that very moment. This too must have been clear to Sun Tzu. Thus he knew that he must kill the two most beloved of the King’s concubines, among the most respected and noblest ladies in the land, and possibly the mothers of his children. Sun Tzu even knew he may be aborting potential heirs to the throne. All this is clear as day, and it is impossible to imagine that the man who wrote the Art of War would not immediately discern it.

    But there is something even more perplexing in the story. The King did not stop the executions. Though the entire affair took place in his own palace, he did not order his men to intervene, or even belay Sun Tzu’s order. He did not have Sun Tzu arrested, expelled, or executed. Nor did he after the fact. Ho Lu simply lamented his loss, and later hired the man who had effected it.

    There are several explanations that come to mind. The simplest is that he indeed was a man of words and not deeds, cowed by the sheer impetuosity of the man before him. However, subsequent events do not support this. Such a man would not engage in aggressive wars of conquest against his neighbors, nor hire the very general who had humiliated and aggrieved him so. Perhaps he feared that Sun Tzu would serve another, turning that prodigious talent against Wu. It would be an understandable concern for a weak ruler who dreaded meeting such a man on the battlefield. But it also was a concern which easily could have been addressed by executing him on the spot. The temperamental Kings of fable certainly would have. Nor did Ho Lu appear to merely dissemble, only to visit some terrible vengeance on the man at a later date. Sun Tzu eventually became his most trusted adviser, described as nearly coequal in power.

    It is possible that Ho Lu lacked the power oft conflated with regality, and less commonly attendant upon it. The title of King at the time meant something very different from modern popular imaginings. The event in question took place around 500 BC, well before Qin Shi Huang unified China — briefly — with his final conquest of Qi in 221 BC. In Ho Lu’s time, kingdoms were akin to city-states, and the Kings little more than feudal barons. As in most historical treatises, troop numbers were vastly exaggerated, and 100,000 troops probably translated to a real army of mere thousands.

    This said, it seems exceedingly improbable that Ho Lu lacked even the semblance of authority in his own palace. Surely he could execute or countermand Sun Tzu. Nor would there be loss of face in doing so, as the entire exercise could be cast as farcical. Who would object to a King stopping a madman who wanted to murder palace concubines? If Sun Tzu was from a prominent family or widely regarded in his own right (which there is no evidence for), harming him would not have been without consequence. But there is a large difference between executing the man and allowing him to have his way in such a matter. Ho Lu certainly could have dismissed Sun Tzu or proposed a more suitable test using peasants or real soldiers. To imagine that a king would allow his favorite concubines to be executed, contenting himself with a feeble protest, is ludicrous. Nor was Sun Tzu at that point a formidable military figure. A renowned strategist would not have troubled to write an entire treatise just to impress a single potential patron. That is not the action of a man who holds the balance of power.

    The conclusion we must draw is that the “favorite concubines” were quite dispensible, and the King’s protest simply the form demanded by propriety. He hardly could not protest the murder of two palace ladies. Most likely, he used Sun Tzu to rid himself of two problems. At the very least, he showed a marked lack of concern for the well-being of his concubines. We can safely assume that his meat and drink did not lose their savour, as he envisioned in his tepid missive before watching Sun Tzu behead the women.

    While it is quite possible that he believed Sun Tzu was just making a point and would stop short of the actual execution, this too seems unlikely. The man had just refused a direct order from the King, and unless the entire matter was a tremendous miscommunication there could be little doubt he would not be restrained.

    Ho Lu may genuinely have been curious to see the outcome. Even he probably could not command obedience from the palace ladies, and he may have wished to see what Sun Tzu could accomplish. But more than this, the King probably felt Sun Tzu was a valuable potential asset. The matter then takes on a very different aspect.

    From this viewpoint, Ho Lu was not the fool he seemed. The test was proposed not in jest, but in deadly earnest, and things went exactly as he had hoped but not expected. He may have had to play the indolent monarch, taking nothing seriously and bereaved by a horrid jest gone awry. It is likely he was engaging in precisely the sort of deception Sun Tzu advocated in his treatise. He appeared weak and foolish, but knew exactly what he wanted and how to obtain it.

    This probably was not lost on Sun Tzu, either. Despite his parting admonition, he did later agree to serve Ho Lu. It is quite possible that the king understood precisely the position he was placing Sun Tzu in, and anticipated the possible executions. Even so, he may have been uncertain of the man’s practical talent and the extent of his will. There is a great divide between those who write words and those who heed them. Some may bridge it, most do not. Only in the event did Sun Tzu prove himself.

    For this reason, Ho Lu could not be certain of the fate of the women. Nonetheless he placed them in peril. They were disposable, if not to be disposed of. It seems plausible that an apparently frivolous court game actually was a determined contest between two indomitable wills. The only ones who did not grasp this, who could not even recognize the battlefield on which they stepped solely to shed blood, were the concubines.

    By this hypothesis, they were regarded as little more than favorite dogs or horses, or perhaps ones which had grown old and tiresome. A King asks an archer to prove his skill by hitting a “best” hound, then sets the dog after a hare, as he has countless times before. The dog quickens to the chase, eagerly performing as always, confident that its master’s love is timeless and true. Of all present, only the dog does not know it is to be sacrificed, to take an arrow to prove something which may or may not be of use one day to its master. If the arrow falls short, it return to its master’s side none the wiser and not one jot less sure of its place in the world or secure in the love of its master, until another day and another archer. This analogy may seem degrading and insulting to the memory of the two ladies, but that does not mean it is inaccurate. It would be foolhardy not to attribute such views to an ancient King and general simply because we do not share them or are horrified by them or wish they weren’t so. In that time and place, the concubines’ lives were nothing more than parchment. The means by which Ho Lu and Sun Tzu communicated, deadly but pure.

    The view that Ho Lu was neither a fool nor a bon vivant is lent credence by the manner of his rise to power. He usurped the throne from his uncle, employing an assassin to accomplish the task. This and his subsequent campaign of conquest are not the actions of a dissipated monarch. Nor was he absent from the action, wallowing in luxury back home. In fact, Ho Lu died from a battle wound during his attempted conquest of Yue.

    It is of course possible that the true person behind all these moves was Wu Zixu, the King’s main advisor. But by that token, it also is quite possible that the entire exercise was engineered by Wu Zixu — with precisely intended consequences, perhaps ridding himself of two noisome rivals with influence over the King. In that case, the affair would be nothing more than a routine palace assassination.

    Whatever the explanation, we should not regard the deaths of the two concubines as a pointless tragedy. The discipline instilled by two deaths could spare an entire army from annihilation on the field. Sun Tzu posited that discipline was one of the key determinants of victory, and in this he was not mistaken. That is no excuse, but history needs none. It simply is.

    This said, it certainly is tempting to regard the fate of these ladies as an unadorned loss. Who can read this story and feel anything but sadness for the victims? Who can think Sun Tzu anything but a callous murderer, Ho Lu anything but foolish or complicit? It is easy to imagine the two court concubines looking forward to an evening meal, to poetry with their friends, to time with their beloved husband. They had plans and thoughts, certainly dreams, and perhaps children they left behind. One moment they were invited to play an amusing game, the next a sharp metal blade cut away all they were, while the man they imagined loved them sat idly by though it lay well within his power to save them. Who would not feel commingled sorrow and anger at such a thing? But that is not all that happened.

    A great General was discovered that day, one who would take many lives and save many lives. Whether this was for good or ill is pointless to ask and impossible to know. All we can say is that greatness was achieved. 2500 years later and in a distant land we read both his tale and his treatise.

    Perhaps those two died not in service to the ambition of one small general in one small kingdom. Perhaps they died so centuries later Cao Cao would, using the principles in Sun Tzu’s book, create a foundation for the eventual unification of China. Or so that many more centuries later a man named Mao would claim spiritual kinship and murder a hundred million to effect a misguided economic policy. Would fewer or more have died if these two women had lived? Would one have given birth to a world-conquering general, or written a romance for the ages?

    None of these things. They died like everyone else — because they were born. The axe that felled them was wielded by one man, ordered by another, and sanctioned by a third. Another made it, and yet another dug the ore. Are they all to blame? The affair was one random happening in an infinitude of them, neither better nor worse. A rock rolls one way, but we do not condemn. It rolls another, but we do not praise.

    But we do like stories, and this makes a good one.

    [Source: The account itself is taken from The Art of War with Commentary, Canterbury Classics edition, which recounted it from a translation of the original in the Records of the Grand Historian. Any wild speculation, ridiculous hypotheses, or rampant mischaracterizations are my own.]

    How 22% of the Population can Rewrite the Constitution

    This is a scary piece in which I analyze precisely how many voters would be required to trigger a Constitutional Convention and ratify any amendments it proposes. Because the 2/3 and 3/4 requirements in the Constitution refer to the number of States involved, the smaller States have a disproportionate effect. In Congress, the House counterbalances this – but for a Constitutional Convention, there is no such check.

    Read the Paper (PDF)

    A Travel-Time Metric

    Especially in urban areas, two locations may be quite close geographically but difficult to travel between. I wondered whether one could create a map where, instead of physical distances, points are arranged according to some sort of travel-time between them. This would be useful for many purposes.

    Unfortunately, such a mapping is mathematically impossible in general (for topological reasons). But so is a true map of the Earth, hence the need for Mercator or other projections. The first step in constructing a useful visualization is to define an appropriate Travel-Time metric function. Navigation systems frequently compute point-to-point values, but they are not bound by the need to maintain a consistent set of Travel Times between all points. That is our challenge – to construct a Travel-Time metric.

    Read the Paper (PDF)

    Inflation, Up Close and Personal

    It often seems like the inflation figures touted by officials and economists have little connection with the real world. There are a number of reasons for this, some technical and some political. But there is a deeper problem than the means and motives for calculating any specific index. The issue is that any aggregate number is likely to deviate significantly from one’s personal experience. Each of us saves for different reasons and spends in different ways. Without taking these specific choices into account, we cannot accurately represent or protect against the inflation that we individually encounter. This paper elaborates on this idea and explains how each of us can identify the relevant components of inflation, and best hedge our savings.

    Read the Paper (PDF)

    A Proposal for Tax Transparency

    Taxes necessarily are unpopular. They represent an economic burden and do not yield obvious benefits. Though some make a show of embracing their civic duty, few voluntarily would undertake to do so if given a choice. The criminal penalties attached to evasion and the substantial efforts at enforcement are evidence of this. Nonetheless, there is a tie between one’s sense of social responsibility and the palatability of taxes. A perception that our sacrifice benefits ourselves, our loved ones, and society as a whole can mitigate the pain it causes. Conversely, if our hard earned money vanishes into an opaque hole of possible waste and corruption, resentment is engendered.

    The taxes paid by an individual represent a substantial sum to him, but a mere pittance to the government. If there is no accounting for this money, then it appears to have been squandered. This assumption is natural, as the government is known to be a notorious spendthrift. Nor does the publication of a voluminous, incomprehensible, and largely euphemistic budget lend transparency. Even if it were perfectly accurate, and every taxpayer troubled to read it, the human mind isn’t wired to accurately grasp the relationships between large numbers. Thirty thousand dollars in taxes is minuscule compared to a billion or ten billion or a hundred billion, and it makes little difference which of those quantities is involved. Therefore an effort to elicit confidence through a full disclosure of expenditures would be ill fated even if well intentioned. However it would serve to enforce accountability, and should be required in addition to any other measures employed. If nothing else, this would allow watchdog organizations to analyze government behavior and identify waste.

    So how could we restore individual faith in the system of government expenditure? There is in fact a way to do so and encourage fiscal responsibility at the same time. Individuals like to know where their money went. A successful tactic of certain charities is to attach each donation to a specific child or benefit. A person feels more involved, is more likely to contribute, and is better satisfied with their contribution if it makes a tangible difference. We need to know that we aren’t wasting our money.

    The pain of an involuntary contribution may be assuaged through a similar approach. It may even transform into pride. There will be individuals who remain resentful, just as there are those who do not donate to charity. And some people simply don’t like being forced to do anything. However the majority of taxpayers likely will feel better if they know precisely where their money went.

    We propose that an exact disposition of each individual’s taxes be reported to him. At first glance, this may seem infeasible. Funds are drawn from pooled resources rather than attached to such specific revenue streams. However, what we suggest can be accomplished without any change in the way the government does business, and our reporting requirement would not prove onerous. The federal, state, and most local governments already meticulously account for expenses – even if they do not exhibit particular restraint in incurring them. They must do so for a variety of legal and regulatory reasons, and records generally exist even if not publicly available.

    Individual tax contributions need only be linked to expenditures at the time of reporting, but this must be done consistently. To that end, expenses could be randomly matched with the taxes that paid for them. This could be done each February or March for the prior year. We simply require that each dollar of taxes collected be assigned to one and only one dollar spent and vice versa. If there is a surplus, then some taxpayers would receive an assignment of “surplus” and if there is a deficit then certain expenses will be assigned a non-tax source – such as borrowed money or a prior year’s surplus. If a taxpayer’s contribution has been marked as surplus, then his true assignment is deferred until such time as the surplus is spent (again using a lottery system for matching). If it covers a prior year’s deficit then it is matched against that year’s excess expenses. The point is that every dollar of taxpayer money eventually is matched against a real expense.

    For example, one taxpayer’s report could read “10K toward the construction of 121 example plaza, New York,” or better still “3K used for the purchase of air conditioning units, 5K for ductwork, and 2K for electrical routing for work done at XXX and billed to YYY contracting on ZZZ date. Work completed on AAA date.” An individual receiving such a report would feel a sense of participation, accountability, and meaningful sacrifice.

    It may seem that few people would feel pride in defraying the cost of mundane items, but such an objection is misguided. These are real expenses and represent a more comprehensible and personal form of involvement than does a tiny fraction of an abstract budget. If an expense would appear wasteful, pointless, or excessive, then it is appropriate to question it.

    What of the pacifist whose money goes toward weapons or the religious individual whose taxes pay for programs that contravene his beliefs? It may seem unfair to potentially violate a taxpayer’s conscience by assigning him an unpalatable expense. But no exceptions should be made. Their money is being spent in the manner described. Whether their contribution is diluted or dedicated, they live in a society that violates their ideals and they should vote accordingly.

    It is our belief that a feeling of involvement in the operation of government, along with the requisite increase in transparency, would alleviate much of the general perception of irresponsibility, excess, and unaccountability. An individual may object to his relative contribution, but the means of its use would no longer be inscrutable. This could go a long way toward restoring faith in our government.

    Probabilistic Sentencing

    In most real situations, we must make decisions based on partial information. We should neither allow this uncertainty to prevent action or pretend to perfect certainty in taking action. Yet in one area with a great impact on an individual’s freedom and well-being we do just that. Judges and juries are required to return an all-or-nothing verdict of guilt. They may not use their experience, intelligence, and judgment to render a level of confidence rather than a mere binary choice.

    I propose adopting a sentencing mechanism based on a probabilistic assessment of guilt or innocence. This allows jurists to better express their certainty or lack thereof than does our traditional all-or-nothing verdict. The natural place to reflect such an imputed degree of guilt is in the sentencing phase. I discuss the implications of such a system as well as certain issues with implementation.

    Read the Paper (PDF)

    The Requirements of Effective Democracy

    The current popular notion of democracy is something to the effect of “the will of the people is effected through voting.’’ Though this is a far cry from the original meaning of the word or its various incarnations through history, let’s take it as our working definition. It certainly reflects the basic approach taken in the United States. Though often confounded by the public mind with a vague cultural notion of freedom, it only conforms to this when taken together with certain other principles – such as explicit protections of individual liberties.

    This aside, let us consider the components necessary for democracy. To do so, we must make some supposition regarding the ability of an individual voter to render a decision. We assume that every voting individual, regardless of aptitude, is capable of determining their purpose in voting. We say “purpose” rather than “criterion” because we refer to a moral choice, what they hope to achieve by voting. This is far more basic and reliable than any specific set of issues or criteria. A person knows their value system, even if they can not or do not have the means of accurately expressing it. The desires to improve the country, foster religious tenets, create a certain type of society, support the weak, advance one’s own interest, protect a specific right, or promote cultural development cannot easily be manipulated or instilled. While it is possible to create a sense of urgency or attach specific issues or criteria to these values, one’s purpose itself is a reflection of that individual’s view of society and their relationship with it. To meaningfully participate in the democratic process, an individual must translate this purpose into particular votes in particular elections. Note that a purpose may embody a plurality of ideals rather than any specific one (such as in the examples above).

    It is the function of democracy to proportionately reflect in our governance and society the individual purposes of the citizenry. A number of components are involved, any of whose absence undermines its ability to do so. While the consequent process may retain all the trappings of a democracy, it would not truly function as one. Though it could be argued that such imperfection is natural and speaks to the shortcomings of the participants rather than a failing of the institution itself, such a claim is misguided. Regardless of cause, if the people’s will is not accurately reflected then the society does not conform to our popular notion of a democracy. Whether another system would perform better is beyond our present consideration. We simply list certain key requirements for a democracy to function as we believe it should, and allow the reader to decide the extent to which our present society satisfies them.

    Note that a particular government need not directly represents the interest of every citizen, but its formation and maintenance must meaningfully do so. In some loose sense this means that (1) the effect of a citizen is independent of who that citizen is, and (2) the opinion of a majority of citizens is reflected in the actions of the government. These are neither precise requirements nor ones satisfied in practice, particularly in representative democracies. However they reflect our vague cultural concept of democracy.

    The following are the major components necessary for a democracy to function as we believe it should.

    Choice

    Once a voter has decided upon a set of positions that reflect their purpose, they must have a means of voting accordingly. There must be sufficient choice to allow an individual to embody those positions in their vote. Furthermore, the choice must be real. Marginal candidates with no chance of winning may be useful for registering an opinion, but they do not offer true participation in the election. If there are only two major candidates then the voter’s entire purpose must be reduced to a binary decision. Only if it happens to be reflected in one of the choices at hand would their view be expressible.

    If there are two major candidates and they differ only on a few issues that are of no consequence to a particular individual, then that person cannot express his purpose by voting. For example if a voter feels very strongly about issue X, and both major candidates have the same opposing position on that issue, then he cannot make his will known in that election. It may be argued that the presence of small candidates serves exactly this purpose and that if sentiment is strong enough one could prevail. This is not born out by history. In a two party system, a voter is reduced to a binary choice between two bundled sets of positions. As a more extreme example, suppose there are several major issues and the candidates agree on one of them. Even if every single person in the country holds the opposite position on that issue, their will still cannot be effected through that election. If there were no other important issues, then one or the other candidate surely would take the popular position – or a third party candidate would do so and prevail. However in the presence of other issues, this need not be the case.

    Finally, there must be some reason to believe that the actions of a candidate once elected will reflect their proclaimed positions. Otherwise, it will be years before the voter can penalize them. Without such an assurance – and history certainly does not offer it – a nominal choice may not be a real one. The people then acts the part of a general who cannot move his troops, however much he may threaten or cajole them.

    Information

    A well-intentioned individual must have a way of locating and obtaining information whose accuracy is not in question or, if uncertain in nature, is suitably qualified. Voters must have access to accurate and sufficient information. In order to translate their purpose into a vote, an individual must be able to determine the choices available and what they actually entail. Moreover, he must be able to determine the relative importance of different issues in effecting his purpose. Fear mongering, inaccurate statistics, and general misinformation could lead him to believe that a particular issue ‘X’ is of greater import than it truly is. Instead of focusing on other issues ‘Y’ and ‘Z’ which are more germaine to his purpose, he may believe that dealing with issue ‘X’ is the most important step toward it. Similarly, if the views of candidates are obfuscated or misrepresented or the significance of events is disproportionately represented, an accurate translation of his purpose into a vote may be denied a person. Even a perfectly rational and capable voter cannot make a suitable decision in the absence of information or in the presence of inaccurate information. This said, not every vehicle should be expected to provide such information. If a person prefers to listen to a news station that reports with a particular bias, that is not the fault of the information provider – unless it does so subtly and pretends otherwise.

    Aptitude

    A voter must have the intelligence, critical reasoning, motivation, and general wherewithal to seek out accurate information, detect propaganda or advertising, and make an informed decision. Their perceived interest must coincide with their true interest, and their purpose be accurately represented in the choice they make. It may seem that we are advocating the disenfrachisement of a segment of the population, individuals who – while failing to meet some high standard – have valid purposes of their own which they too have the right to express. This is not the case, nor is our standard artificial. We are merely identifying a necessary ingredient, not endorsing a particular path of action. Moreover, the argument that they would be deprived of a right is a specious one. Such individuals are disenfranchised, whether or not they physically vote. They lack the ability to accurately express their purpose, and easily are misled, confused, or manipulated. At best they introduce noise, at worst their votes may systematically be exploited. A blind person may have a valid destination, but they cannot drive there.

    Access

    Voters must be willing and able to participate. They cannot be blocked by bureaucratic, economic, legal, or practical obstacles – especially in a way that introduces a selection bias. Their votes must be accurately tallied and their decision implemented.

    Structure

    Not only must the structure of the democratic process treat all voters equally, their de facto influence must be equal. Depending on the nature of the voting system, certain participants may have no real influence even if the system as a whole treats them symmetrically. A simple example would be a nation consisting of four states with blocks of 3, 3, 2, and 1 votes, where each block must vote as a unit. Regardless of the pattern of voting, citizens in the state with a single vote can never affect the outcome. If that vote is flipped, the majority always remains unchanged. This particular topic is addressed in another paper.

    There certainly are many other technical and procedural requirements. However those listed above are critical components that directly determine a voter’s ability to express their will through the democratic process. In their absence, voters could be thwarted, manipulated, misled, or confused. The purpose of democracy isn’t to tally votes, but to register the will of the people. Without the choice and tools to express this will, the people can have nothing meaningful to register.

    A System for Fairness in Sentencing

    We often hear of cases that offend our sense of fairness – excessive sentences, minor crimes that are punished more severely than serious crimes, or two equivalent crimes that are punished very differently. Rather than attempt to solve a politically and legally intractable problem, we ask a more theoretical question: whether an individual can assign sentences in a way that seems reasonable and consistent to him.  Our system is a means of doing so.  We offer a simple algorithmic method that could be used by an individual or review board to ensure that sentences meet a common-sense standard of consistency and proportionality.

    We intend to offer a less mathematical and more legally-oriented version of this article in the near future.

    Read the Paper (PDF)

    Why Voting Twice is a Good Thing

    We should require that every bill be ratified by a second vote, one year after its original passage. It goes into effect as normal, but automatically expires if not ratified at the appropriate time.

    Sometimes foolish legislation is passed in the heat of the moment or due to short term pressures. Perhaps there is an approaching election, or the media has flamed popular hysteria over some issue, or there is a demand for immediate action with no time for proper deliberation, or an important bill is held hostage to factional concerns, or legislators are falling all over one another to respond with a knee jerk reaction to some event. There are many reasons why thoughtful consideration may succumb to the influences of the moment. The consequences of such legislation can be real and long lasting. Law enforcement resources may be diverted or rights suppressed or onerous demands made on businesses. It is true that legislation may be repealed, but this requires an active effort. The same forces that induced the original legislation, though weakened by time, may threaten to damage anyone who takes the initiative to rectify it.

    Here is a simple proposal that could address this problem: Every piece of legislation should be voted on a second time, one year after its original passage. This vote would serve to ratify it. By making this mandatory, the burden of attempted repeal is not placed on any individual. Rather, legislators need simply change their vote. This is less likely to create a fresh political tempest, the issue’s emotional fury long spent. When an act is passed, it goes into effect as normal. However one year from that date, it must be ratified or it will expire. Obviously this should only apply to bills for which such ratification is meaningful; there would be no point in revoting on the prior year’s budget after the money has been spent. By requiring a ratification vote, legislators are given time to breath, sit back, and consider the ramifications of a particular piece of legislation. The intervening year also may provide some flavor of its real effect. A similar approach could be used at all levels of government.

    The Optics of Camera Lens Stacks (Program)

    In another post, I discussed the mathematical calculation of optical parameters for a configuration of stacked lenses and camera components. As is evident from the example worked out there, the procedure is somewhat tedious. Instead, it is better to spend twice the time writing a program to do it. Fortunately I already did this and offer it to you, gentle reader, to use and criticize. I expect no less than one rabid rant about some aspect that doesn’t pedantically conform to the IEEE standard. This is working code (and has been checked over and tested to some extent). I use it. However, it is not commercial grade and was not designed with either efficiency or robustness in mind. It is quick and dirty – but graciously so.

    Think of this as a mine-shaft. You enter at your own risk and by grace of the owner. And if you fall, there won’t be non-stop human interest coverage on 20 TV channels as rescue workers try to extract you. That’s because you’re not a telegenic little kid and this is a metaphor. Rather, you will end up covered in numeric slime of dubious origin. But I still won’t care.

    All this said, I do appreciate constructive criticism and suggestions. Please let me know about any bugs. I don’t plan to extensively maintain this program, but I will issue fixes for significant bugs.

    The program I provide is a command line unix (including MacOS) utility. It should be quite portable, as no funky libraries are involved. The program can analyze a single user-specified configuration or scan over all possible configurations from an inventory file. In the latter case, it may restrict itself to configurations accessible using the included adapters or regardless of adapter. It also may apply a filter to limit the output to “interesting” cases such as very high magnification, very wide angle, or high telephoto.

    The number of configurations can be quite large, particularly when many components are available, there are no constraints, and we account for the large number of focal/zoom choices for each given stack. For this reason, it is best to constrain scans to a few components in an inventory (by commenting out the components you don’t need). For example, if one has both 10 and 25mm extension tubes then try with only one. If this looks promising, restrict yourself to the components involved and uncomment the 25mm as well.

    Either through the summary option or the use of a script to select out desirable configurations, the output may be analyzed and used for practical decisions. For example, if a 10x macro lens is needed and light isn’t an issue then a 1.4X telextender followed by a 200mm zoom followed by a reversed 28mm will do the trick. It will have a high f-stop, but if those components are already owned and we don’t need a low f-stop it may be far more cost-effective option than a dedicated ultra-macro lens (there aren’t any at 10X, but a 5X one is available).

    For simple viewing of the results, I recommend the use of my “tless” utility. This isn’t a shameless plug. I wrote tless for myself, and I use it extensively.

    Go to Google Code Archive for Project

    The Optics of Camera Lens Stacks (Analysis)

    This first appeared on my tech blog. I like to play around with various configurations of camera lenses.  This partly is because I prefer to save money by using existing lenses where possible, and partly because I have a neurological condition (no doubt with some fancy name in the DSM-IV) that compels me to try to figure things out. I spent 5 years at an institute because of this problem and eventually got dumped on the street with nothing but a PhD in my pocket.  So let this be a warning: keep your problem secret and don’t seek help.

    A typical DSLR (or SLR) owner has a variety of lenses.  Stacking these in various ways can achieve interesting effects, simulate expensive lenses (which may internally be similar to such a stack), or obtain very high magnifications.  Using 3 or 4 lenses, a telextender, a closeup lens, and maybe some extension rings (along with whatever inexpensive adapter rings are needed), a wide variety of combinations can be constructed.  In another entry, I’ll offer a companion piece of freeware that enumerates the possible configurations and computes their optical properties.

    In the present piece, I examine the theory behind the determination of those properties for any particular setup.  Given a set of components (possibly reversed) and some readily available information about them and the camera, we deduce appropriate optical matrices, construct an effective matrix for the system, and extract the overall optical properties – such as focal length, nearest object distance, and maximum magnification.  We account for focal play and zoom ranges as needed.

    The exposition is self-contained, although this is not a course on optics and I simply list basic results.  Rather, I focus on the application of matrix optics to real camera lenses.  I also include a detailed example of a calculation.

    As far as I am aware, this is the only treatment of its kind.  Many articles discuss matrix methods or the practical aspects of reversing lenses for macro photography.  However, I have yet to come across a discussion of how to deduce the matrix for a camera lens and vice-versa.

    After reading the piece, you may wonder whether it is worth the effort to perform such a calculation.  Wouldn’t it be easier to simply try the configurations?  To modify the common adage, a month on the computer can often save an hour in the lab.  The short answer is yes and no.  No I’m not an economist, why do you ask?

    If you have a specific configuration in mind, then trying it is easier.  However, if you have a set of components and want to determine which of the hundreds of possible configurations are candidates for a given use (just because the calculation works, doesn’t mean the optical quality is decent), or which additional components one could buy to make best use of each dollar, or which adapter rings are needed, or what end of the focal ranges to use, then the calculation is helpful.  Do I recommend doing it by hand?  No.  I even used a perl script to generate the results for the example.  As mentioned, a freeware program to accomplish this task in a more robust manner will be forthcoming.  Think of the present piece as the technical manual for it.

    Tless Table Viewer

    Over the years, I’ve found delimited text files to be an easy way to store or output small amounts of data. Unlike SQL databases, XML, or a variety of other formats, they are human readable. Many of my applications and scripts generate these text tables, as do countless other applications. Often there is a header row and a couple of columns that would best be kept fixed while scrolling. One way to view such files is to pull them into a spreadsheet, parse them, and then split the screen. This is slow and clumsy, and updates are inconvenient to process. Instead, I wanted an application like the unix utility ‘less’ but with an awareness of table columns. The main requirements were that it be lightweight (i.e. keep minimal content in memory and start quickly), parse a variety of text file formats, provide easy synchronized scrolling of columns and rows, and allow horizontal motion by columns. Strangely, no such utility existed. Even Emacs and vi don’t provide an easy solution. So I wrote my own unix terminal application. I tried to keep the key mappings as true to “less” (and hence vi) as possible. The code is based on ncurses and fairly portable. The project is hosted on Google Code and is open source.

    Go to Google Code Archive for this Project

    Influence in Voting

    Have you ever wondered what really is meant by a “deciding vote” on the Supreme Court or a “swing State” in a presidential election? These terms are bandied about by the media, but their meaning isn’t obvious. After all, every vote is equal, isn’t it? I decided to explore this question back in 2004 during the election year media bombardment. What started as a simple inquiry quickly grew into a substantial project. The result was an article on the subject, which I feel codifies the desired understanding. The paper contains a rigorous mathematical framework for block voting systems (such as the electoral college), a definition of “influence”, and a statistical analysis of the majority of elections through 2004. The work is original, but not necessarily novel. Most if not all has probably been accomplished in the existing literature on voting theory. This said, it may be of interest to a technical individual interested in the subject. It is self-contained, complete, and written from the standpoint of a non-expert in the field. For those who wish to go further, my definition of “influence” is related to the concept of “voting power” in the literature (though I am unaware of any analogue to my statistical definition).

    Ye Olde Physics Papers

    Once upon a time there was a physicist. He was productive and happy and dwelt in a land filled with improbably proportioned and overly cheerful forest creatures. Then a great famine of funding occurred and the dark forces of string theory took power and he was cast forth into the wild as a heretic. There he fought megalomaniacs and bureaucracies and had many grand adventures that appear strangely inconsistent on close inspection. The hero that emerged has the substance of legend.

    But back to me. I experienced a similar situation as a young physicist, but in modern English and without the hero bit.   However, once upon a time I DID write physics papers. This is their story…

    My research was in an area called Renormalization Group theory (for those familiar with the subject, that’s the “momentum-space” RG of Quantum Field Theory, rather than the position-space version commonly employed in Statistical Mechanics – although the two are closely related).

    In simple terms, one could describe the state of modern physics (then and now) as centering around two major theories: the Standard Model of particle physics, which describes the microscopic behavior of the electromagnetic, weak, and strong forces, and General Relativity, which describes the large scale behavior of gravity. These theories explain all applicable evidence to date, and no prediction they make has been excluded by observation (though almost all our effort has focused on a particular class of experiment, so this may not be as impressive as it seems). In this sense, they are complete and correct. However, they are unsatisfactory.  

    Their shortcomings are embodied in two of the major problems of modern physics (then and now): the origin of the Standard Model and a unification of Quantum Field Theory with General Relativity (Quantum Field Theory itself is the unification of Quantum Mechanics with Special Relativity). My focus was on the former problem.  

    The Standard Model is not philosophically satisfying. Besides the Higgs particle, which is a critical component but has yet to be discovered, there is a deeper issue. The Standard Model involves a large number of empirical inputs (about 21, depending on how you count them), such as the masses of leptons and quarks, various coupling constants, and so on. It also involves a specific non-trivial set of gauge groups, and doesn’t really unify the strong force and electro-weak force (which is a proper unification of the electromagnetic and weak forces). Instead, they’re just kind of slapped together. In this sense, it’s too arbitrary. We’d like to derive the entire thing from simple assumptions about the universe and maybe one energy scale.

    There have been various attempts at this. Our approach was to look for a “fixed point”. By studying which theories are consistent as we include higher and higher energies, we hoped to narrow the field from really really big to less really really big – where “less really really big” is 1. My thesis and papers were a first shot at this, using a simple version of Quantum Field Theory called scalar field theory (which coincidentally is useful in it’s own right, as the Higgs particle is a scalar particle). We came up with some interesting results before the aforementioned cataclysms led to my exile into finance.

    Unfortunately, because of the vagaries of copyright law I’m not allowed to include my actual papers. But I can include links. The papers were published in Physical Review D and Physical Review Letters. When you choose to build upon this Earth Shattering work, be sure to cite those. They also appeared on the LANL preprint server, which provides free access to their contents. Finally, my thesis itself is available. Anyone can view it, but only MIT community members can download or print it. Naturally, signed editions are worth well into 12 figures. So print and sign one right away.

    First Paper on LANL (free content)
    Second Paper on LANL (free content)
    Third Paper on LANL (free content)
    First Paper on Spires
    Second Paper on Spires
    Third Paper on Spires
    Link to my Thesis at MIT

    Writings and Ravings