You are here
Content Written by Author
In sports some people believe that there is a Sports Illustrated cover jinx: teams or individual athletes featured on the cover for outstanding performance, subsequently have a bad season. One explanation for this phenomenon is regression to the mean. The prior year’s performance, which earned the cover spot, is based on a combination of skill and luck. The skill persists to the next period but the luck does not (and in fact is just as likely to turn into “bad luck”). So being featured on the cover selects for having had extra luck which is not replicable.
In business the equivalent is building an extravagant headquarters building. Companies are especially likely to engage in that after they have had a number of great years. Those years of course tend to be the result of a combination of company capabilities and favorable circumstances. While the capabilities persist the circumstances are likely to change. I don’t know if anyone has attempted a rigorous study of this, but here is a list from 2009 of companies that all built lavish new headquarters just before their performance declined dramatically.
Regression to the mean is incredibly widespread. Whenever you look at a performance record, especially one that’s an outlier, you should consider the possibility. That’s of course true also for startup performance and is yet another reason why paying extended prices for companies that appear to be network effect winners in their category can end poorly.
NOTE: This is part of a series of excerpts from my book World After Capital. Today’s post wraps up the prior discussion of why as part of increasing informational freedom we should embrace a post privacy world.
So we can’t really protect privacy without handing control of technology into the hands of a few and conversely decentralized innovation requires reduced privacy. What should we do? The answer, I think, is to embrace a post-privacy world. We should work to protect people and their freedom, instead of protecting data and privacy. We should allow more information to become public, while strengthening individual freedom to stand against the potential consequences. Such an embrace can and will happen gradually. Much information is already no longer private through hacks and data breaches that abruptly expose data on millions of people . And many individuals are voluntarily disclosing previously private information about themselves on blogs and social media. Economic freedom via a Universal Basic Income (UBI) will play a key role here. Much of the fear about private information being revealed results from potential economic consequences. For instance, if you are worried that you might lose your job and not be able to pay your rent if your employer finds out that you wrote a blog post about struggling with depression, you are much less likely to do so.
If you think that a post privacy world is impossible or terrifying, it is worth remembering that privacy is really a modern and urban construct. Even the United States Constitution, while protecting certain specific rights, does not recognize a generalized right to privacy (the word privacy does not appear at all). For thousands of years prior to the 18th century, most people had no concept of privacy. Many of the functions of everyday life, including excretion and reproduction, took place much more openly that they do today. And privacy still varies greatly among cultures—many Westerns are shocked when they first experience the openness of Chinese public restrooms  (although these appear to be disappearing). All over the world, people in small villages live with much less privacy than is common in big cities. You could regard the lack of privacy as oppressive, or you could see a close-knit community as a real benefit and source of strength. For instance, I remember growing up in a small village in Germany where if a member of our community was sick and couldn’t leave the house, a neighbor would quickly check up on them and offer to do the shopping or provide food.
You might ask, what about my bank account? If my account number was public, wouldn’t it be much easier for bad actors take my money? Yes, which is why we need to construct systems that don’t just require a number that you have already shared with others to authorize payments. Apple Pay and Android Pay are such systems. Every transaction requires an additional form of authentication at the time of transaction. Two factor authentication systems will become much more common in the future for any action that you will take in the digital world. In addition, we will rely more and more on systems such as Sift, another USV portfolio company, that assess in real time the likelihood that a particular transaction is fraudulent, taking into account hundreds of different factors. Finally, much of blockchain technology is built on the idea that addresses can be public because they are protected by private keys, making it possible even for transactions to be part of a public ledger.
Another area where people are especially nervous about privacy is health information. We worry, for instance, about employers, insurers, or others in society discriminating against us because they’ve learned that we have a certain disease or condition. Here the economic freedom conferred by a Universal Basic Income would protect you from going destitute because of discrimination, and by tightening the labor market, it would also make it harder for employers to decide to systematically refuse to hire certain groups of people. Further, we could enact laws that require sufficient transparency on the part of organizations, so that we could better track how decisions have been made and detect more easily if it appears that discrimination is taking place.
Observers such as 4Chan founder Chris Poole have worried that in the absence of privacy, individuals wouldn’t be able to engage as fully and as freely online as they do today. Privacy, they think, helps people feel comfortable taking on multiple identities online that may depart dramatically from one another and from their “real life” selves. But I hold a different view. By keeping our various online selves separate, we allow for a lot of inner conflict to persist. We pay a price for this in the form of anxieties, neuroses, and other psychological ailments. It’s far better to be fully transparent about the many sides of our personality than to cloister ourselves behind veils of privacy. Emotional and psychological health derives not from a splintering or fragmentation of the self, but the integration of different aspects into a unitary but multi-dimensional personality. [Look for psychological research backing this point]
Many who argue against embracing a post privacy approach, point out that oppressive governments can use information against citizens. People give examples such as the Nazis prosecuting homosexuals or the Chinese government prosecuting dissidents. Without a doubt preserving democracy and the rule of law are essential if we want to achieve a high degree of informational freedom. But the analysis cannot simply hold the level of privacy constant and switch out the regime. One also needs to consider how likely a regime change is for given levels of privacy. And there I am convinced that more public information makes dictatorial takeovers considerably harder. For instance, with public tax records it is much clearer who is benefiting from political change. Conversely, history has taught us that it is entirely possible to build a totalitarian surveillance state with minimal technology by having citizens spy on each other.
No matter whether you think Amazon HQ2 in New York was a good or a bad idea, it is worth considering how terrible the execution of the whole affair has been. It all started with the way Amazon conducted the search for the new location. Making a bunch of cities compete in a secret bidding contest may strike some people as brilliant but was always a terrible idea. Didn’t anyone at Amazon think: “Hey we are one of the world’s most valuable companies, run by the world’s richest person and so asking cities to give us massive concessions is not going to look good?” Apparently not, or at least not anyone who dared say it or had any real pull inside the company.
I should be quick to point out that’s not actually all that surprising. Success tends to beget that kind of failure mode where eventually everyone is so bought into how everything a company does is obviously good and awesome and everyone else in the world will surely see it the same way (just look at Facebook if you want another example of that). Still it is a failure to “read the room” of epic proportions.
And what about New York City? Similar story. Nobody thought about the optics? At a time when riding the MTA is testing everyone’s patience (never mind the debacle around the L train) and when the city is experiencing its worst affordable housing crisis in decades, didn’t anyone go: we need to really sell how this deal will be broadly good for New York, and not just for people in tech whose salaries have been growing faster than anyone else’s? Any near-term HQ2 concession should have come with commitments to take a meaningful portion of long-term tax revenue gains and use them to address transportation infrastructure and affordable housing.
There are a lot of lessons here. And none of them will be learned by simply accusing the other side of not getting how economics works. This is all about misunderstanding where we are on the perception of tech companies and their role and responsibility with regard to the polarization of society.
After an excursion for the last few Uncertainty Wednesdays into the topic of intelligence, I want to return to how to think about risk. In a post about a decade ago I wrote that it is possible for a startup to be too innovative by trying too many new things at one. Here is the key line from that post:
If you innovate along too many dimensions at once you are multiplying risk (as startup risk is multiplicative, not additive).
So what does it mean for startup risk to be multiplicative and not additive? The simplest way to think about it is that as a startup you need to succeed at say five different things and if any one of them fails, the whole endeavor fails. What are those five things? Finding at least some degree of product market fit (in a sufficiently large market), having enough capital, building an organization that can execute, actually delivering your product and building sustainable competitive advantage.
Another way to write this list is that startups take risk in at least five categories: market risk, financing risk, organizational risk, technology risk and competitive risk. Those risks are multiplicative. Imagine for a moment that perfect execution in any of the five categories earns you a 10 in that category and a total mess up gives you a 0. That means the range of outcomes goes from 0 to 10^5 = 100,000 (should have made it six categories, oh well). The key thing is if you get a 0 on one of the categories, it doesn’t matter if you get a 10 elsewhere, the combined outcome will still be 0 * 10 * 10 * 10 * 10 * 10 = 0. For example, if you run out of cash hard (e.g. in the middle of a financial crisis when everyone else is scrambling) you can wind up with a 0 even if everything else is going great.
Obviously this scale is not meant to be taken literally. But it gets the idea of a multiplicative risk structure across. From an investor’s perspective it means that if you are considering a startup that has big question marks in more than one of the five categories you are taking a lot more risk of a bad outcome.
Are there investing risks that are additive instead of multiplicative? Well, investing in multiple companies as you build a portfolio. The companies can succeed or fail independently of each other. If one company succeeds you add that to the total outcome. Conversely a company failing can be thought of as subtraction from the total outcome. Of course how independent they truly are depends on how well diversified a portfolio you are building.
So whether it is startups or some other decision, always think about whether the risk structure is additive or multiplicative. How much care you take and whether you get benefits of diversification depends massively on that fundamental structure.
NOTE: This is part of a series of excerpts from my book World After Capital. Today’s post continues on the idea from last week that privacy is not compatible with technological progress (if you would rather watch a talk, you can find some of the same material in my Blockstack Berlin presentation).
So why do I keep asserting the impossibility of assuring privacy? Don’t we have encryption? Encryption is great for securing information in transit and at rest, but there are problems that encryption doesn’t and can’t solve.
The first problem is that encryption keys are also just digital information themselves, so keeping them secure confronts us with just another instance of the original problem. Transmitting your keys leaves them vulnerable to interception. Even generating a key on your own machine offers limited protection, unless you are willing to have that be the only key with the risk that any data you’re protecting will be lost forever if you lose the device. As a result, most systems include some kind of cloud based backup and a way of retrieving a key, making it possible that someone will access your data either through technical interception or social engineering (tricking a human being to unwittingly participate in a security breach). If you want some sense of how hard this problem is consider the millions of dollars in crypto currency that have been lost both by people who lost their key and also people who had their keys taken over through some form attack. And the few crypto currency companies and exchanges that have a decent track record have invested huge sums in security procedures, screening of personnel and secrecy.
The second problem is so-called “endpoint security.” Consider, for example, the computer of the doctor to whom you are sending your x-ray for a second opinion. That machine may have a program running on it that can access anything that is displayed on the screen. In order to view your x-ray, the doctor of course has to decrypt it and display it, so this screen capture program will have access to the unencrypted image. Avoiding such a scenario would require us to lock down all computing devices. But that means preventing end-users from installing software on them and running all software through a rigorous centralized inspection process. Even a locked down endpoint is still subject to the so-called “analog hole,” in which someone simply takes a picture of what is displayed on a screen. That picture today can of course be taken with a digital camera and instantly shared again.
Locked down computing devices reduce Informational Freedom and constrict innovation; they also pose a huge threat to democracy and the Knowledge Loop. Someone else would control what you can compute, who you can exchange information with, and so on, in what would essentially become a dictatorial system. Already today we are headed in this direction in mobile computation, in no small part due to the assertion of a need to protect privacy. Apple uses this argument as to why the only way to install apps on an iPhone should be through the Apple app store. Now imagine this type of regime extended to all computing devices, including your laptop and servers in the cloud. So here we have the first way in which privacy is incompatible with technological progress. We can either have really strong privacy assurance or we can have open general purpose computing but not both.
Many people contend that there must be some way to preserve privacy and keep innovating. I challenge anyone to create a coherent vision of the future where individuals, not governments or large corporations (such as Apple) control technology and where privacy is meaningfully protected. Any time you leave your house, you are probably being filmed by someone’s camera. Every smartphone has a camera these days, and in the future we’ll see tiny cameras on tiny drones. Your gait identifies you almost as uniquely as your fingerprint. Your face is probably somewhere on the Internet and your car’s license plate is readable by any camera. You leave your DNA almost everywhere you go, and soon individuals will be able to sequence DNA at home for about 100 dollars. Should the government control all of these technologies? Should it level draconian punishments for using these technologies to analyze someone else’s presence or movement? And if so how would those penalties be enforced?
But there is an even deeper and more profound reason why privacy is incompatible with technological progress. Entropy is the enemy of life and it is a fundamental property of the universe. There are many more arrangements of atoms that make absolutely nothing that there are arrangements that make a house or for that matter a human being. That means that it always easier to destroy than it is to create. Anyone who has spent hours building a sand castle on the beach only to see it destroyed by a single wave as the tide comes in has a visceral sense of this asymmetry. What does this have to do with privacy you may ask? As we make technological progress our ability to destroy grows much faster than our ability to create. It still takes 20 years to grow and adult human being. Modern weapons can kill hundreds, thousands, sadly even millions of humans in an instant. So as we make technological progress we must insist on less privacy to protect society. Imagine for a moment a future in which I can create a potent biological weapon in my basement laboratory (a future that is not far off). Ex post police enforcement is meaningless in such a world.
So we can’t really protect privacy without handing control of technology into the hands of a few and conversely decentralized innovation requires reduced privacy. So what should we do? The answer, I think, is to embrace a post-privacy world. We should work to protect people and their freedom, instead of protecting data and privacy. In other words allowing more information to become public but strengthening individual freedom to stand against the potential consequences. Such an embrace does not need to happen overnight. Rather we can take small steps into it starting with individuals who voluntarily disclose more information about themselves.
The question of “How Much is Enough?” isn’t just the title of an excellent book by Edward and Robert Skidelsky, but has come to be at the forefront of political debate. We are finally discussing such topics as a wealth tax and openly questioning whether there should be billionaires altogether. At the same time we are being treated to the spectacle of one billionaire flirting with a run as an independent candidate for the US presidency and another taking to Medium to fight extortion by a publication. All of that is a good debate to be having including for billionaires themselves.
Why? Because this question has been and is at the heart of what it means to live a good life as a member of a civilized society. People aspiring to be billionaires (who actually does that?) would do well to think about Schultz and Bezos and ask themselves whether that appears all that appealing. I had an interesting conversation with a billionaire a couple of years back, who quite openly admitted his frustration with not being able to buy certain things, such as the instant ability to fluently speak a foreign language. Now you might say, maybe one day money will be able to buy that or something even more precious, such as immortality. The moral of the story though is: there will always be something that it can’t buy and hence beyond a relatively low limit (far below a billion) your mental state is far more determinative of whether or not you are leading a good life than how much money you have.
And from the perspective of society, it is an important question to ask what money should be able to buy. In particular, how much political power it should translate into is a crucial question if we want to live in a well functioning democracy. The question of just how much influence money buys in politics is hotly contested but it is eminently clear that if anyone who is not a billionaire or mega celebrity declared interest in an independent run nobody would bother to pay any attention. And very large foundations have long been shaping policy in crucial areas such as education and healthcare outside of the democratic process.
In World After Capital, I write extensively about how we have become confused about the difference between needs and wants. And how digital technologies are resulting in power law distributions which are amplifying luck.
The best time to be having this discussion would have been ten years ago, the second best time is now.
Today’s Uncertainty Wednesday will be the last post for a while about matters related to intelligence following my posts on the problems of sample correlation under fat tails and on dynamic versus static models. Now might be a good time to state that I firmly believe we should be researching human intelligence including its genetic component. This should not be off limits to science, especially at a time when we are actively building artificial intelligence and making major progress in neuroscience and in understanding and modifying the human genome. My point is simply that this is an area in which we must proceed with extreme caution and avoid strong claims. That is both because of the sordid history of using intelligence claims to justify all sorts of horrors and because we are at the beginning of our understanding of the extraordinary complexity of our genome and its expression as well as of our brain.
The point of all three prior posts was to point out methodological problems that many people making or repeating strong claims today seem either to be unaware of or to dismiss. To close out this mini series, I want to mention two more such issues. The first relates to so-called genome-wide association studies (GWAS) that are currently being done to come up with polygenic scores for all sorts of measures, including intelligence. Now given the size of the genome we are faced with tens of millions of potential features as the difference between two humans is currently estimated at about 20 million base pairs. So this is very much a situation where any analysis faces the curse of dimensionality and it is possible to wind up with random loadings that still appear to have explanatory power. Again, I am not saying we shouldn’t do such analyses, just that we need a ton of data, a lot of careful analysis and repeatable independent replication of results.
Second, intelligence, unlike say height, is something we have a hard time defining and measuring (if it were easy we would be a lot further with creating artificial intelligence). In fact one of the central concepts of lot of intelligence research is the so called g factor, which is a measure of correlation among performance on a variety of cognitive tasks. In other words, the g factor is a computed value, which adds some interesting methodological challenges (e.g., what should be a Bayesian prior for the g factor?). Added to this is the challenge that the rise of intelligence testing is co-incident with the industrial age and so a lot of “validation” measures of intelligence tests has been their ability to predict success in an industrial society, such as degree of educational attainment or lifetime income. For a species that has hundreds of thousand of years of history before the industrial age and will (hopefully) have hundreds of thousand of years of history after it, that is likely way too narrow a way of thinking about intelligence. Another problem arising from this historic coincidence is that we really don’t have much of an idea of how intelligence could be developed in a radically different educational system, since our current system was also developed during the same time (and in no small part influenced by strong claims about intelligence). The current system is heavily biased towards people who are fast learners out of the gate. By contrast slow starters are usually cut off from most prolonged learning opportunities and of course many people are excluded entirely. As a result we have very little to no data on how someone who is interested in math but might be off to a slow start (or no start) could do if they were able to stick with it for a decade or more.
So by all means let’s do research. But let’s approach this field with the extreme care and degree of skepticism that is called for. And until we really know much more than we do today, let’s avoid strong claims entirely. This is a responsibility not just for those working in the field but also anyone covering it or simply re-tweeting it.
Last Uncertainty Wednesday, I wrote about how sample correlations are not meaningful under fat tails. Today I want to continue this line of argument in the specific context of the claimed relationship between country IQ and GDP. There is strong evidence that the distribution of GDP growth rates is in fact fat tailed.
Why look at growth rates instead of absolute numbers? Because the whole argument has to be a dynamic one and not a static one. We can best see this from the following illustration, which shows the extraordinary growth of China’s and India’s per capita GDP.
Over an 8 year period China’s per capita GDP nearly doubles and India’s grows by 50%. At the same time major developed economies such as the US and the EU are essentially flat.
When you then combine this with the large number of people in those countries, you can see the dynamic rise of China and India in the global total GDP rankings (ignore the projections into the future, but watch how China and India are not on the chart at first and then climb rapidly):
This level of dynamism is easily possible when growth rates are fat tailed. But it also means that any static sample correlation on country IQ and GDP is completely useless. You either have to concluded that country IQ can change quite rapidly (which makes it a useless measure) or that GDP growth isn’t related to it in a meaningful way after all. Personally, I believe both to be the case, i.e. the former is an ill-defined measure and the latter is determined by changes in government and economic systems.
It is easy to feel pessimistic at the end of 2018. CO2 emissions are still climbing rapidly and reached an all time high in 2018, with severe weather events accelerating globally. Facebook and Twitter continue to be used for manipulation and their approaches to moderation are just as problematic. And the political response to all of this is largely one of chaos dominated by strongmen politicians, including the recent election of Bolsonaro in Brazil.
So how can one stay optimistic? One way is to look at things that happened in 2018 that can be seen as early signs of positive change. Signs that we can and will do better over time. Here are just some examples:Climate
We have evidence that when we get our act together on an environmental issue, then a recovery is possible. While on an admittedly smaller scale, the ozone layer is on track to a complete recovery. The aggregate growth in CO2 emissions hides real progress that’s being made, such as the UK going for 1,000 hours without coal or global deployment of solar and wind energy reaching 1 TW in capacity. In the US, the Tesla Model 3 became the fastest selling car (by revenue) and China is leading the world in electric vehicle sales with a commitment to going all electric by 2040.Networks
It also turns out that we do not need to be slaves to online networks. In 2018 Apple released Screen Time as part of iOS 12 and Google released Digital Wellbeing for Android to help people track and limit their usage of apps like Instagram. 2018 was also the year when Facebook engagement in the United States started to decrease for the first time (important footnote: Instagram which also belongs to Facebook is still growing). Regulators globally started to take more serious interest in online networks in 2018 including a US congressional hearing and an EU hearing among many other inquiries and court cases.Politics
The congressional class elected in 2018 is the most diverse ever elected with a record number of women entering politics. Voter turnout in the midterm elections was much higher than in the last three decades including many more young voters. In general young people started to engage in politics more in 2018 including organizing the March for our Lives. We are also starting to make long overdue improvements to the democratic process starting at the state level. Maine carried out its first ranked-choice voting in the midterms and several states including Colorado, Michigan and Missouri adopted anti-gerrymandering amendments.
For 2019 let’s continue to build the momentum of these positive developments. And in that spirit: Happy New Year!
PS If you have developments from 2018 that give you reason for optimism please share them in the comments or on this Twitter thread.
NOTE: I have been posting excerpts from my book World After Capital. Currently we are on the Informational Freedom section and the last week’s post was on being represented by a bot. Today looks at the rolling back copyright.
Once we have fought back geographical and prioritization limits and have bots in place so that all users can meaningfully control their own interactions with the global knowledge network, we still come up against limits that restrict which information you can share and what you can create based on how you obtained the information. We’ll first look at copyright and patent laws and suggest policies for reducing how much these limit the knowledge loop. Then we’ll turn to confidentiality and privacy laws.
Earlier I remarked how expensive it was to make a copy of a book when human beings literally had to copy it one letter at a time. Eventually we invented the printing press, and after that movable type. Together the two provided for much faster and cheaper reproduction of information. Even back then, governments and also the church saw this as a threat to their authority. In England, the Licensing of the Press Act of 1662 predated modern attempts to censor the web by more than 300 years: if you operated a printing press and wanted the right to make copies, you needed the government’s approval . You received it in exchange for agreeing to censor content critical of the government or that ran counter to church teachings. And that’s the origin of copyright. It is the right to make copies in return for agreeing to censorship.
Over time, as economies grew and publishing companies emerged as business enterprises, copyright became commercially meaningful, less as an instrument of government control and more as a source of profit. The logic runs like this: “If I have the copyright to a specific material, then you cannot make copies of it, which means that I essentially have a monopoly in providing this content. I am the only one allowed to produce and sell copies of it.”
Legitimating this shift was the idea that in order to get content produced in the first place, incentives needed to exist for the creators of content, just as incentives needed to exist for people to create tangible or material goods. If you own your factory, then you will invest in it because you get to keep the benefits from those improvements. Similarly, the thinking goes, if you are working on a book, you should own the book so that you have an incentive to write it in the first place and improve it over time through revisions.
Over time the holders of copyrights have worked to strengthen their claims and extend their reach. For instance, with the passing of The Copyright Act of 1976, the requirement to register a copyright was removed. Instead, if you created content you automatically had copyright in it . Then in 1998 with passage of the Copyright Term Extension Act, the years for which you had a copyright were extended from 50 to 70 years beyond the life of the author. This became known as the “Mickey Mouse Protection Act,” because Disney had lobbied the hardest for it, having built a very large and profitable business based on protected content, and mindful that a number of its copyrights were slated to expire .
More recently, copyright lobbying has attempted to interfere with the publication of content on the Internet through legislation such as PIPA and SOPA, and more recently the TPP. In these latest expansion attempts, the conflict between copyright and the digital knowledge loop becomes especially clear. Copyright severely limits what you can do with content, essentially down to consuming the content. It dramatically curtails your ability to share it and create other works that use some or all of the content. Some of the more extreme examples include takedowns of videos from YouTube that used the Happy Birthday song, which, yes, was copyrighted until recently.
From a societal standpoint, given digital technology, it is never optimal to prevent someone from listening to a song or watching a baseball game once the content exists. Since the marginal cost of accessing it is zero, the world is better off if that person gets just a little bit of enjoyment from that content. And if that person turns out to be inspired and write an amazing poem that millions read, well then the world is a lot better off.
Now, you might say, it’s all well and good that the marginal cost for making a copy is zero, but what about all the fixed and variable cost that goes into making content? If all content were to be free, then where would the money come from for producing any of it? Don’t we need copyright to give people the incentive to produce content in the first place?
Some degree of copyright is probably needed, especially for large-scale projects such as movies. Society may have an interest in seeing $100 million blockbuster films being made, and it may be that nobody will make them if, in the absence of copyright protection, they aren’t economically viable. Yet here the protections should be fairly limited (for instance, you shouldn’t be able to take down an entire site or service just because it happens to contain a link to a pirated stream of your movie). More generally, I believe copyright can be dramatically reduced in its scope and made much more costly to obtain and maintain. The only automatic right accruing to content should be one of attribution. The reservation of additional rights should require a registration fee, because you are asking for content to be removed from the digital knowledge loop.
Let’s take music as an example. Musical instruments were made as far back as 30,000 years ago, pre-dating any kind of copyright by many millennia. Even the earliest known musical notation, which marks music’s transition from information to knowledge (again, defined as something that can be maintained and passed on by humans over time and distance), is around 3,400 years old . Clearly people made music, composed it, shared it long before copyright existed. In fact, the period during which someone could make a significant amount of money making and then selling recorded music is extraordinarily short, starting with the invention of the gramophone in the 1870s and reaching its heyday in 1999, the year that saw the biggest profits in the music industry .
During the thousands of years before this short period, musicians made a living either from live performances or through patronage. If copyrighted music ceased to exist tomorrow, people would still compose, perform, and record music. And musicians would make money from live performances and patronage, just as they did prior to the rise of copyright. Indeed, as Steven Johnson found when he recently examined this issue, that’s already what is happening to some degree: “the decline in recorded-music revenue has been accompanied by an increase in revenues from live music… Recorded music, then, becomes a kind of marketing expense for the main event of live shows” . Many musicians have voluntarily chosen to give away digital versions of their music. They release tracks for free on Soundcloud or YouTube and raise money to make music from performing live and/or using crowdfunding methods such as Kickstarter and Patreon.
Now imagine a situation where the only automatic right accruing to an intellectual work was one of attribution. Anyone wanting to copy or distribute your song in whole or in part has to credit you. Such attribution can happen digitally at zero marginal cost and does not inhibit any part of the knowledge loop. Attribution imposes no restrictions on learning (making, accessing, distributing copies), on creating derivative works, and on sharing those. Attribution can include reference to who wrote the lyrics, who composed the music, who played which instrument and so on. Attribution can also include where you found this particular piece of music (i.e., giving credit to people who discover music or curate playlists). This practice is already becoming more popular using tools such as the Creative Commons License, or the MIT License often used for attribution in open source software development.
Now, what if you’re Taylor Swift and you don’t want others to be able to use your music without paying you? Well, then you are asking for your music to be removed from the knowledge loop, thus removing all the benefits that loop confers upon society. So you should be paying for that right, which not only represents a loss to society but will be costly to enforce. I don’t know how big the registration fee should be — that’s something that will require further work — but it should be a monthly or annual fee, and when you stop paying it, your work should revert back to possessing attribution-only rights.
Importantly, in order to reserve rights, you should have to register your music with a registry, and some part of the copyright fee would go towards maintenance of these registries. Thanks to blockchain technology, competing registries can exist that all use the same global database. The registries themselves would be free for anyone to search, and registration would involve a prior search to ensure that you are not trying to register someone else’s work. The search could and should be built in a way so that anyone operating a music sharing service, such as Spotify or Soundcloud, can trivially implement compliance to make sure they are not freely sharing music that has reserved rights.
It would even be possible to make the registration fee dependent on how many rights you want to retain. All of this could be modeled after the wildly successful Creative Commons licenses. For instance, your fee might decrease if you allow non-commercial use of your music and also allow others to create derivative works. The fee might increase significantly if you want all your rights reserved. The same or similar systems could be used for all content types, including text, images and video.
Critics might object that the registration I’m proposing imposes a financial burden on creators. It is important to remember the converse: Removing content from the knowledge loop imposes a cost on society. And enforcing this removal, for instance by finding people who are infringing and imposing penalties on them, imposes additional costs on society. For these reasons, asking creators to pay is fair, especially if creators’ economic freedom is already assured by a Universal Basic Income. We have generated so much economic prosperity that nobody needs to be a starving artist anymore!
Universal Basic Income also helps us dismantle another argument frequently wielded in support of excessive copyright: Employment at publishers. The major music labels combined currently employ roughly 17,000 people   . When people propose limiting the extent of copyright, others point to the potential loss of these jobs. Never mind that the existence of this employment to some degree reflects the cost to society from having copyright. Owners, managers and employees of music labels are after all not the creators of the music.
Before turning to patents, let me point out one more reason why a return to a system of paid registration of rights makes sense. None of us creates intellectual works in a vacuum. Any author who writes a book has read lots of writing by other people. Any musician has listened to tons of music. Any filmmaker has watched lots of movies. Much of what makes art so enjoyable these days is the vast body of prior art that it draws upon and can explicitly or implicitly reference. There is no “great man” or woman who creates in a vacuum and from scratch. We are all part of the knowledge loop that has already existed for millennia.
In the last few days Apple suspended the enterprise certificates for first Facebook and then Google, rendering internal iOS apps instantly useless. Apple did so in response to revelations that both companies had used the enterprise certificates to distribute VPN apps to teenagers in order to better understand phone usage. This is apparently in violation of the enterprise certificate license.
Some people have cheered Apple’s actions as not only justified but appropriate sanctions on Facebook and Google. It would appear that Apple acted within its contractual rights and there are reasonable questions about these research efforts. In any case though Apple’s actions and their impact illustrates the extraordinary power Apple has over its devices.
I have written before that I believe this level of control is detrimental to innovation and is a source of excess rents. It has been interesting to see how the take rate in PC game app stores is being driven towards 10% as a result of competition. So instead of celebrating Apple’s actions here we should see them as a reminder of a lack of competition and a disempowerment of endusers. An easy ability for consumers to directly load apps should be a legal requirement. This would allow competitive app stores to emerge.
NOTE: I have been posting excerpts from my book World After Capital. Currently we are on the Informational Freedom section and the previous excerpt was on Internet Access. Today looks at the right to be represented by a bot (code that works on your behalf).
Once you have access to the Internet, you need software to connect to its many information sources and services. When Sir Tim Berners-Lee first invented the World Wide Web in 1989 to make information sharing on the Internet easier, he did something very important . He specified an open protocol, the Hypertext Transfer Protocol or HTTP, that anyone could use to make information available and to access such information. By specifying the protocol, Berners-Lee opened the way for anyone to build software, so-called web servers and browsers that would be compatible with this protocol. Many did, including, famously, Marc Andreessen with Netscape. Many of the web servers and browsers were available as open source and/or for free.
The combination of an open protocol and free software meant two things: Permissionless publishing and complete user control. If you wanted to add a page to the web, you didn’t have to ask anyone’s permission. You could just download a web server (e.g. the open source Apache), run it on a computer connected to the Internet, and add content in the HTML format. Voila, you had a website up and running that anyone from anywhere in the world could visit with a web browser running on his or her computer (at the time there were no smartphones yet). Not surprisingly, content available on the web proliferated rapidly. Want to post a picture of your cat? Upload it to your webserver. Want to write something about the latest progress on your research project? No need to convince an academic publisher of the merits. Just put up a web page.
People accessing the web benefited from their ability to completely control their own web browser. In fact, in the Hypertext Transfer Protocol, the web browser is referred to as a “user agent” that accesses the Web on behalf of the user. Want to see the raw HTML as delivered by the server? Right click on your screen and use “view source.” Want to see only text? Instruct your user agent to turn off all images. Want to fill out a web form but keep a copy of what you are submitting for yourself? Create a script to have your browser save all form submissions locally as well.
Over time, popular platforms on the web have interfered with some of the freedom and autonomy that early users of the web used to enjoy. I went on Facebook the other day to find a witty note I had written some time ago on a friend’s wall. It turns out that Facebook makes finding your own wall posts quite difficult. You can’t actually search all the wall posts you have written in one go; rather, you have to go friend by friend and scan manually backwards in time. Facebook has all the data, but for whatever reason, they’ve decided not to make it easily searchable. I’m not suggesting any misconduct on Facebook’s part—that’s just how they’ve set it up. The point, though, is that you experience Facebook the way Facebook wants you to experience it. You cannot really program Facebook differently for yourself. If you don’t like how Facebook’s algorithms prioritize your friends’ posts in your newsfeed, then tough luck, there is nothing you can do.
Or is there? Imagine what would happen if everything you did on Facebook was mediated by a software program—a “bot”—that you controlled. You could instruct this bot to go through and automate for you the cumbersome steps that Facebook lays out for finding past wall posts. Even better, if you had been using this bot all along, the bot could have kept your own archive of wall posts in your own data store (e.g., a Dropbox folder); then you could simply instruct the bot to search your own archive. Now imagine we all used bots to interact with Facebook. If we didn’t like how our newsfeed was prioritized, we could simply ask our friends to instruct their bots to send us status updates directly so that we can form our own feeds. With Facebook on the web this was entirely possible because of the open protocol, but it is no longer possible in a world of proprietary and closed apps on mobile phones.
Although this Facebook example might sound trivial, bots have profound implications for power in a networked world. Consider on-demand car services provided by companies such as Uber and Lyft. If you are a driver today for these services, you know that each of these services provides a separate app for you to use. And yes you could try to run both apps on one phone or even have two phones. But the closed nature of these apps means you cannot use the compute power of your phone to evaluate competing offers from the networks and optimize on your behalf. What would happen, though, if you had access to bots that could interact on your behalf with these networks? That would allow you to simultaneously participate in all of these marketplaces, and to automatically play one off against the other.
Using a bot, you could set your own criteria for which rides you want to accept. Those criteria could include whether a commission charged by a given network is below a certain threshold. The bot, then, would allow you to accept rides that maximize the net fare you receive. Ride sharing companies would no longer be able to charge excessive commissions, since new networks could easily arise to undercut those commissions. For instance, a network could arise that is cooperatively owned by drivers and that charges just enough commission to cover its costs. Likewise, as a passenger using a bot could allow you to simultaneously evaluate the prices between different car services and choose the service with the lowest price for your current trip. The mere possibility that a network like this could exist would substantially reduce the power of the existing networks.
We could also use bots as an alternative to anti-trust regulation to counter the overwhelming power of technology giants like Google or Facebook without foregoing the benefits of their large networks. These companies derive much of their revenue from advertising, and on mobile devices, consumers currently have no way of blocking the ads. But what if they did? What if users could change mobile apps to add Ad-Blocking functionality just as they can with web browsers?
Many people decry ad-blocking as an attack on journalism that dooms the independent web, but that’s an overly pessimistic view. In the early days, the web was full of ad-free content published by individuals. In fact, individuals first populated the web with content long before institutions joined in. When they did, they brought with them their offline business models, including paid subscriptions and of course advertising. Along with the emergence of platforms such as Facebook and Twitter with strong network effects, this resulted in a centralization of the web. More and more content was produced either on a platform or moved behind a paywall.
Ad-blocking is an assertion of power by the end-user, and that is a good thing in all respects. Just as a judge recently found that taxi companies have no special right to see their business model protected, neither do ad-supported publishers . And while in the short term this might prompt publishers to flee to apps, in the long run it will mean more growth for content that is paid for by end-users, for instance through a subscription, or even crowdfunded (possibly through a service such as Patreon).
To curtail the centralizing power of network effects more generally, we should shift power to the end-users by allowing them to have user agents for mobile apps, too. The reason users don’t wield the same power on mobile is that native apps relegate end-users once again to interacting with services just using our eyes, ears, brain and fingers. No code can execute on our behalf, while the centralized providers use hundreds of thousands of servers and millions of lines of code. Like a web browser, a mobile user-agent could do things such as strip ads, keep copies of my responses to services, let me participate simultaneously in multiple services (and bridge those services for me), and so on. The way to help end-users is not to have government smash big tech companies, but rather for government to empower individuals to have code that executes on their behalf.
What would it take to make bots a reality? One approach would be to require companies like Uber, Google, and Facebook to expose all of their functionality, not just through standard human usable interfaces such as apps and web sites, but also through so-called Application Programming Interfaces (APIs). An API is for a bot what an app is for a human. The bot can use it to carry out operations, such as posting a status update on a user’s behalf. In fact, companies such as Facebook and Twitter have APIs, but they tend to have limited capabilities. Also, companies presently have the right to control access so that they can shut down bots, even when a user has clearly authorized a bot to act on his or her behalf.
Why can’t I simply write code today that interfaces on my behalf with say Facebook? After all, Facebook’s own app uses an API to talk to their servers. Well in order to do so I would have to “hack” the existing Facebook app to figure out what the API calls are and also how to authenticate myself to those calls. Unfortunately, there are three separate laws on the books that make those necessary steps illegal.
The first is the anti-circumvention provision of the DMCA. The second is the Computer Fraud and Abuse Act (CFAA). The third is the legal construction that by clicking “I accept” on a EULA (End User License Agreement) or a set of Terms of Service I am actually legally bound. The last one is a civil matter, but criminal convictions under the first two carry mandatory prison sentences.
So if we were willing to remove all three of these legal obstacles, then hacking an app to give you programmatic access to systems would be possible. Now people might object to that saying those provisions were created in the first place to solve important problems. That’s not entirely clear though. The anti circumvention provision of the DMCA was created specifically to allow the creation of DRM systems for copyright enforcement. So what you think of this depends on what you believe about the extent of copyright (a subject we will look at in the next section).
The CFAA too could be tightened up substantially without limiting its potential for prosecuting real fraud and abuse. The same goes for what kind of restriction on usage a company should be able to impose via a EULA or a TOS. In each case if I only take actions that are also available inside the company’s app but just happen to take these actions programmatically (as opposed to manually) why should that constitute a violation?
But, don’t companies need to protect their encryption keys? Aren’t “bot nets” the culprits behind all those so-called DDOS (distributed denial of service) attacks? Yes, there are a lot of compromised machines in the world, including set top boxes and home routers that some are using for nefarious purposes. Yet that only demonstrates how ineffective the existing laws are at stopping illegal bots. Because those laws don’t work, companies have already developed the technological infrastructure to deal with the traffic from bots.
How would we prevent people from adopting bots that turn out to be malicious code? Open source seems like the best answer here. Many people could inspect a piece of code to make sure it does what it claims. But that’s not the only answer. Once people can legally be represented by bots, many markets currently dominated by large companies will face competition from smaller startups.
Legalizing representation by a bot would eat into the revenues of large companies, and we might worry that they would respond by slowing their investment in infrastructure. I highly doubt this would happen. Uber, for instance, was recently valued at $50 billion. The company’s “takerate” (the percentage of the total amount paid for rides that they keep) is 20%. If competition forced that rate down to 5%, Uber’s value would fall to $10 billion as a first approximation. That is still a huge number, leaving Uber with ample room to grow. As even this bit of cursory math suggests, capital would still be available for investment, and those investments would still be made.
That’s not to say that no limitations should exist on bots. A bot representing me should have access to any functionality that I can access through a company’s website or apps. It shouldn’t be able to do something that I can’t do, such as pretend to be another user or gain access to private posts by others. Companies can use technology to enforce such access limits for bots; there is no need to rely on regulation.
Even if I have convinced you of the merits of bots, you might still wonder how we might ever get there from here. The answer is that we can start very small. We could run an experiment with the right to be represented by a bot in a city like New York. New York’s municipal authorities control how on demand transportation services operate. The city could say, “If you want to operate here, you have to let drivers interact with your service programmatically.” And I’m pretty sure, given how big a market New York City is, these services would agree.
A couple of weeks ago I first tweeted about what gives people reasons for optimism and then wrote a post. Both of these were preparations for giving the opening keynote for the 2019 DLD conference. I will eventually put the slides online but in the meantime, here is the video from the talk
NOTE: Today’s excerpt from World After Capital continues the topic of Informational Freedom, discussing overreach in the patent system and offering prizes as an alternative mechanism. This is timely as the patent office has unfortunately issued new rules that will make it easier to obtain software patents undoing the tightening from prior Supreme Court decisions.
While copyright limits our ability to share knowledge, patents limit our ability to use knowledge to create something. Much like having a copyright confers a monopoly on reproduction, a patent confers a monopoly on use. And the rationale for the existence of patents is similar to the argument for copyright. The monopoly that is granted results in economic rents (i.e., profits) that are supposed to provide an incentive for people to invest in research and development.
As with copyright, the incentive argument here should be suspect. People invented long before patents existed and some people have continued to invent without seeking patents. We can trace early uses of patents to Venice in the mid 1400s; Britain had a fairly well established system by the 1600s . That leaves thousands of years of invention, a time that saw such critical breakthroughs as the alphabet, movable type, the wheel, and gears. This is to say nothing of those inventors who more recently chose not to patent their inventions because they saw how that would interrupt the knowledge loop and impose a loss on society. These inventors include Jonas Salk, who created the Polio vaccine (others include x rays, penicillin, ether as an anesthetic, and many more, see ). Since we know that limits on knowledge use impose a cost, we should therefore ask what alternatives exist to patents to stimulate innovation.
Many people are motivated simply by wanting to solve a problem. This could be a problem they are having themselves or something that impacts family or friends or the world at large. With a Universal Basic Income more of these people will be able to spend their time on inventing following intrinsic motivation.
We will also see more invention because digital technologies are reducing the cost of inventing. One example of this is the USV portfolio company Science Exchange, which has created a market place for laboratory experiments. Let’s say you have an idea that requires you to sequence a bunch of genes. The fastest gene sequencing available to date comes from a company called Illumina, whose machines costs from $850K-$1M to buy . Via Science Exchange, however, you can access such a machine on a per use basis for less than $1000 . Furthermore, the next generation of sequencing machines is already on the way, and these machines will further reduce the cost. Here too we see the phenomenon of technological deflation at work.
A lot of recent legislation has needlessly inflated the cost of innovation. In particular, rules around drug testing have made drug discovery prohibitively expensive. We have gone too far in the direction of protecting patients during the research process and also of allowing for large medical damage claims. As a result, many drugs are either not developed at all or are withdrawn from the market despite their efficacy (for example the vaccine against Lyme disease, which is no longer available for humans ).
Patents (i.e., granting a temporary monopoly) are not the only way to provide incentives for innovation. Another historically successful strategy has been the offering of public prizes. Britain famously offered the Longitude rewards starting in 1714 to induce solutions to the problem of determining a ship’s longitude at sea (latitude can be determined easily from the position of the sun). Several people were awarded prizes for their designs of chronometers, lunar distance tables and other methods for determining longitude (including improvements to existing methods). As quid pro quo for receiving the prize money, inventors generally had to make their innovations available to others to use as well .
At a time when we wish to accelerate the Knowledge Loop, we must shift the balance towards knowledge that can be used freely and that is not encumbered by patents. It is promising to see successful recent prize programs, such as the X Prizes, DARPA Grand Challenges, and NIST competitions. There is also potential for crowdfunding future prizes. Medical research in particular should be a target for prizes to help bring down the cost of healthcare.
Going forward, we can achieve this by using prizes more frequently. And yet, that leaves a lot of existing patents in place. Here I believe a lot can be done to reform the existing system and make it more functional, in particular by reducing the impact of so-called Non Practicing Entities (NPEs, commonly referred to as “patent trolls”). These are companies that have no operating business of their own, and exist solely for the purpose of litigating patents.
In recent years, many NPEs have been litigating patents of dubious validity. They tend to sue not just a company but also that company’s customers. This forces a lot of companies into a quick settlement. The NPE then turns around and uses the early settlement money to finance further lawsuits. Just a few dollars for them go a long way because their attorneys do much of the legal work on a contingency basis, expecting further settlements. Fortunately, a recent Supreme Court ruling placed limits on where patent lawsuits can be filed, which should help limit the activity of these NPEs going forward .
As a central step in patent reform, we thus must make it easier and faster to invalidate existing patents while at the same time making it more difficult to obtain new patents. Thankfully, we have seen some progress on both counts in the U.S., but we still have a long way to go. Large parts of what is currently patentable should be excluded from patentability in the first place, including designs and utility patents. University research that has received even small amounts of public funding should not be eligible for patents at all. Universities have frequently delayed the publication of research in areas where they have hoped for patents that they could subsequently license out. This practice has constituted one of the worst consequences of the patent system for the Knowledge Loop.
We have also gone astray by starting to celebrate patents as a measure of technological progress and prowess instead of treating them as a necessary evil (and maybe not even necessary). Ideally, we would succeed in rolling back the reach of existing patents and raising the bar for new patents while also inducing as much unencumbered innovation as possible through the bestowing of prizes and social recognition.
We just returned from a wonderful week skiing. One of our family traditions is to all work together completing a puzzle. It’s a fun activity that involves a surprising amount of teamwork, such as finding pieces that belong in a part of the puzzle someone else is working on and trading off working on different areas.
We had always picked 1,000 piece puzzles. This time we wound up buying a 3,000 piece puzzle. After all, how much harder could it be? Well, a lot harder as it turns out. As a rough approximation nearly 10x as hard!
Here is the puzzle in a semi-finished state. That’s as far as we got before the vacation ended, despite spending quite a bit of collective time on it.
There is a good business and life lesson in this. As you add more parts to a problem, the complexity tends to grow explosively. Here are some strategies to combat this problem in business:
Whenever you add a feature, consider removing one that’s used by only a small fraction of users (important caveat: if you have 100x consumers to producers on your service you need to do this analysis separately for each group!)
As you hire employees make sure they are organized into logical units that can work as independently from each other as possible (doing the 1,000 piece puzzles is 3x a 1,000 piece puzzle).
Architect your software so that component services are loosely coupled via APIs.
What’s your favorite strategy for avoiding growth in complexity in business and/or in life?
A startup founder I know likes to say that their leadership style is “frequently wrong, but never in doubt.” Often that expression is applied as a critique, such as in Cheryl Wheeler’s song Driving Home, but the founder meant it as a positive model along the lines of the idea that even a bad decision is better than no decision. Given the high degree of uncertainty inherent in startups, how to lead in its presence is one of the crucial founder/CEO challenges. So should a leader share their doubts about a course of action with the team?
That framing of the question has an implicit assumption: that the leader has doubts to begin with and hence needs to make a decisions whether to share those or not. To some this may seem like a preposterous question, after all, who doesn’t have doubts? Only an overly sure fool would seem not to. But the word doubt has a lot of connotations, including lack of confidence and even distrust. So what do we even mean by asking about doubt and sharing it?
To help narrow this down, I therefore want to use other words and distinguish between “second guessing” and “re-evaluating.” The former is questioning a decision without material new information. The latter is revisiting a decision after material new information has been obtained. It is second guessing which is destructive for morale, because it calls into question not just the decision but also undermines the legitimacy of the decision making process itself. A a leader you should keep any second guessing strictly to yourself.
Re-evaluating on the other hand is healthy but requires a good decision making process. In particular, there has to be a relatively clear way of assessing whether something is in fact material new information. There is a famous quote, often attributed to Keynes: when the facts change, I change my opinion – what do you do? If you have a good process for making decisions then it will be quite clear whether something is a material new fact and the team will be able to be quite dispassionate about re-evaluating the decision.
So as a good exercise, next time you feel doubt about a decision, ask yourself if you are second guessing or if you are re-evaluating. And if you find yourself second guessing a lot, then it likely says something about problems with the decision making process (and potentially about your own fears).
I was overly optimistic when I thought I could resume excerpts from World After Capital today. Turns out the next section is on privacy and it needs major editing. So while I am working on that, here is the talk I gave at Blockstack Berlin that summarizes my admittedly controversial view that privacy is incompatible with technological progress. We need strategies other than privacy to remain free in the digital age.
In investing there is uncertainty about returns. Some investments do well, others do poorly. But that is not the only risk that investors are concerned about when they are investing professionally on behalf of others. There is also the issue of perception: it is one thing not to make money in a sector, it is another not to make money in that sector when everyone else appears to be making money in it. Similarly, it is one thing to lose money on a trade, it is another to lose money on a trade that people have tried many times before and is now widely “known” to be a money-losing trade.
In each case the investor is not just taking return risk but also perception risk. If the others are right, then not only will returns be below the benchmarks but there is also the question: why did you think you were smarter than everyone else? And, well, nobody really wants to hear that. Beyond a bruised ego, the perception risk will eventually also impact one’s ability to raise money for a fund. Why? Because most of the money put into funds is put there by people who are also professional investors and hence face similar perception risk!
I believe perception risk explains why there is so much herding into popular sectors and why, conversely, some sectors go underfunded for long periods of time after big losses have been incurred. For example in 2001, Brad and I together with a mutual friend tried to raise a fund on an investment program roughly similar to what eventually became USV. Nobody wanted to give us money. I remember one meeting with someone at Goldman Sachs particularly well. After we had explained how the next value creation in the Internet would be at the application level (because so much had gone into infrastructure during the dotcom bubble), the person we were pitching looked at us and said “So you are saying you will invest in shitty little companies?”
It will be interesting to see how this plays itself out in crypto now. Longtime crypto investors like to point out how Bitcoin has had multiple previous big corrections. While that is correct from a return risk perspective, it fails to account for perception risk. None of the prior corrections had remotely the same level of public visibility. So to think that institutional investors will by piling in right now is to ignore perception risk. To invest now means taking both return risk and perception risk. That’s why climbing out of the winter of the burst Dotcom bubble took time and that’s why the same is likely to be true for crypto.
Senator Elizabeth Warren yesterday proposed a wealth tax. It is a way of dealing with the rising inequality issue that I also discuss in World After Capital. I propose a longer term way of getting to a similar place via demurrage, but I am supportive of a wealth tax. I believe it will have to be broad based though and include all large accumulations of wealth, including in trusts, foundations, endowments and all other pools that don’t have clear existing offsetting obligations (such as pension funds). The reason to go broad on a wealth tax is that all of these accumulations of wealth contribute not just to inequality but also to a large power imbalance, including large foundations influencing education and healthcare policy outside of the democratic process. Independent of where this proposal ultimately goes, I am glad that we finally have candidates willing to propose bold new ideas (this also includes Andrew Yang’s platform), instead of rehashing old ones.
Today’s Uncertainty Wednesday revisits a favorite topic of mine: correlation. I first wrote about the importance of thinking about correlation in modeling over 10 years ago, long before starting the Uncertainty Wednesday series (the reference to Excel is a giveaway). I then had a three part series about spurious correlation which you can find here: part 1, part 2 and part 3. Here is the key introductory paragraph:
Well, as you have seen from the posts on sample mean and sample variance, whenever you are dealing with a sample the observed values of statistical measure have their own distribution. The same is of course true for correlation. So two random variable may be completely independent, but when you draw a sample, the sample happens to have correlation. That is known as spurious correlation.
But the situation is way worse than that when one of the random variables involved has a fat tailed distribution in which extremes occur with higher probability than say a normal distribution. Why? Because many fat tailed distributions do not have a well-defined variance. Instead their variance explodes towards infinity. Yet any sample from a fat tailed distribution will have a finite variance (by construction). The sample variance in this situation is not an estimate of the actual variance – since the latter does not exist. By extension, a correlation in which at least one variable is fat tailed has the same problem.
This should be a complete “Duh” moment and yet people cannot help themselves but use sample correlation all the time in settings where fat tails are extremely likely. Again, the sample correlation will always exist (by construction), but it doesn’t have to mean a thing! This is incredibly hard for us to accept: we follow a recipe (how to calculate correlation), we get a number (sample correlation) and yet we are supposed to ignore it?
Here is a way of understanding why. Ask yourself what would happen to the correlation if you had more data points. This is an important mental exercise as it is a counter-factual (you only have your existing data points). Sometimes a deeper truth can only be arrived at by realizing that your sample is misleading you.
Let’s look at a concrete example: outcomes in venture investing are generally seen to be fat tailed. You have a sample of venture fund sizes and returns. The sample shows negative correlation – greater fund sizes appear correlated with lower return. Can you actually conclude that this is the case? You have to ask yourself what would happen if just one large fund completely hit it out of the park? Or maybe two funds did. Would the sign on your correlation flip to positive?
If outcomes really are fat tailed you will find that your sample correlation is not really robust (you could actually simulate this by “drawing” fro the distribution an adding new hypothetical points to your sample and then recalculating the sample correlation). This also turns out to be a central argument that Nassim Taleb made in his recent criticism of IQ as a valid concept. As per usual his criticism leaves lots of room for further debate, but I have yet to see a response that at least attempts to address this problem of sample correlation under fat tails.