You are here
Content Written by Author
I was mostly offline last week skiing in the Alps. I love skiing for many reasons: I find it a great way to experience flow state and it combines being outdoors with vigorous exercise (especially when skinning or bootpacking up the mountain carrying a bunch of gear). The mountains are also a great reminder of how inhospitable and formidable nature can be.
Here is a picture I took from the top of Mont Fort in Verbier
Next week Monday I will resume posting excerpts from World After Capital but want to point out briefly just how transformative a basic income could be to what people can experience in the world. Skiing today is mostly a hobby for wealthy people. But with basic income anyone could decide to live in the mountains for some time, which outside of the big ski resorts are cheap. How does one ski then? The same way we did on parts of this vacation: by hiking up the mountain and then skiing down. And there are plenty of people in the world who can teach others how to do that.
So the last two Uncertainty Wednesday posts have been about spurious correlation. Today, I want to give an example of easy it is to observe spurious correlation. To that end I wrote a little Python program which I will show below that rolls two independent dice. Each is rolled 10 times to give us two data series of 10 points each. This mimics the 10 data point series from last week’s example.
The program runs some number of these and each times calculates and outputs the coefficient of correlation. I then use Google Sheets to produce a histogram.
Here is the result for 1,000 runs
And here are 10,000 runs
What we see is the distribution of the sample correlation. As we add more runs we once again see a normal distribution emerge (isn’t that fascinating?).
Looking at the charts we see that the center of the distribution is 0, which is reassuring as it suggests that the two sets of dice rolls were in fact independent. But we also see that the bulk of the histogram is for values of the correlation that are, well, not zero. Put differently, you are much more likely to observe a non-zero correlation than a zero correlation in these samples.
Now you might say though, even with 10,000 runs we don’t see correlations above 0.75 or below -0.8, but the divorce rate example had a correlation of 0.99. So isn’t that extremely unlikely?
Keep in mind that the correlation in the example was between two basically arbitrary data series of 10 points each. There are, well, billions (actually infinitely many) such series. So 10,000 runs is really nothing. So I changed the program to 1 million runs. The maximum observed correlation on that was 0.9727, much closer to the 0.99 in the example. In fact the minimum observed correlation on that run was -0.9902!
But this direction of analysis is the wrong direction. We are assuming independence (remember this is a strong assumption) and then checking how likely it is to observe a specific correlation. What we really want to figure out is what our updated belief about correlation should be depending on a less restrictive prior.
Interestingly, this takes me to the edge of my own knowledge and so I have asked an expert in Bayesian estimation to help me. Stay tuned!
PS Here is the Python code:from scipy.stats import randint from numpy import corrcoef max = 0 min = 0 runs = 10000 for run in range(0, runs): r1 = randint.rvs(1,7,size=10) r2 = randint.rvs(1,7,size=10) cm = corrcoef(r1, r2) c = cm print c
NOTE: Last week’s blog post was the current introduction to my book World After Capital. Today’s post covers the first half of the chapter on Digital Technology, which discusses how zero marginal cost is unlike anything found in the analog world.
The invention of agriculture expanded the space of the possible by dramatically increasing the food density of land. This allowed humanity to have surplus food, which provided the basis for increased population density and hierarchical societies that developed standing armies, specialization of labor and writing .
The Enlightenment and subsequent Industrial Revolution further expanded the space of the possible by substituting human power for machine power and increasing our understanding of, and control over, chemical and physical transformations of matter. This allowed humanity to make extraordinary material progress on the basis of innovations in energy, manufacturing, transportation and communication .
Digital technologies provide the third expansion of the space of the possible. This seems like a bold claim, and many have derided digital technologies such as Twitter, arguing that they are inconsequential compared to, say, the invention of vaccines.
Yet we can already see the disruptiveness of digital technologies. For instance, many previously well established businesses, such as newspapers and retailers, are struggling, while companies that deal only in information, such as Google and Facebook, are among the world’s most highly valued .
There are two characteristics of digital technology that expand the space of the possible, and both are important: the first is zero marginal cost and the second is the universality of digital computation.
Zero Marginal Cost
Once a piece of information is on the Internet, it can be accessed from anywhere on the network for no additional cost. As more and more people around the world are connected to the Internet, “anywhere on the network” increasingly means anywhere in the world. The servers are already running. The network connections and end user devices are already in place and powered up. Making one extra digital copy of the information and delivering it across the network is therefore free. In the language of economics: the “marginal cost” of a digital copy is zero. That doesn’t mean there aren’t people trying to charge you, in many cases there are. Zero marginal cost is a statement about cost, not about prices.
Zero marginal cost is radically different from anything that has come before it in the analog world, and it makes possible some pretty amazing things. To illustrate, imagine you own a pizzeria. You pay rent for your store, you pay for your equipment, and you pay salaries for your staff (and yourself). All of these are so-called “fixed costs.” They don’t change at all with the number of pizzas you bake. “Variable costs,” on the other hand, depend on the number of pizzas you make. For a pizzeria, these include the cost of the water, flour, and other ingredients used in making pizzas. Variable cost also includes the energy you need to heat your oven. If you make more pizzas, your variable cost goes up. If you make fewer, your variable cost goes down.
So what is marginal cost? Well, let’s say you are up and running making 100 pizzas every day. The marginal cost is the additional cost to make the 101st pizza. Assuming the oven is already hot and has room in it for one more pizza, then the additional cost for that 101st pizza is just the cost of the ingredients, which is likely relatively low. Imagine now that the oven has already cooled off, then the marginal cost of the 101st pizza would include the energy cost required for re-heating the oven. In that case the marginal cost could be quite high.
From a business perspective, you would want to make that 101st pizza as long as you can sell it for more than its marginal cost. Every cent above marginal cost makes a contribution towards fixed cost, helping to pay for rent and salaries. If you have already covered all your fixed cost from the previous pizzas sold, then every cent above marginal cost for the 101st pizza is profit.
Marginal cost also matters from a social perspective. As long as a customer is willing to pay more than the marginal cost for that pizza, then everyone is better off. You’re better off because you get extra contribution towards your fixed cost or your profit. Your customer is better off because, well, they just ate a pizza they wanted! Even if the customer paid exactly the marginal cost you wouldn’t be any worse off and the customer would still be better off.
Let’s consider what happens as marginal cost falls from an initially high level. Imagine for a moment that your key ingredient is an exceedingly rare and expensive truffle and therefore the marginal cost of your pizzas is $1,000 per pizza. Clearly you won’t be selling a lot of pizzas. You decide to switch to cheaper ingredients and start to bring down your marginal cost to where a larger number of customers are willing to pay more than your marginal cost. In New York City, where I live, that seems to be around $25 per pizza. So you start selling quite a few pizzas. As you bring down the marginal cost of your pizza even further through additional process and product improvements (e.g., a thinner crust, economies of scale, etc.), you can start selling even more pizzas.
Now imagine that through a magical new invention you can make additional pizzas at close to zero marginal cost (say one cent per additional pizza), including nearly instantaneous (say one second) shipment to anywhere in the world. What would happen then? Well, for starters you would be able to sell an exceedingly large number of pizzas. And if you charged even just two cents per pizza you would be making one cent of contribution or profit for every additional pizza you sell.
At such low marginal cost you would probably be the only pizza seller in the world (a monopoly—more on that later). From a social welfare standpoint, anyone in the world who was hungry and who wanted pizza and could afford at least one cent would ideally be getting one of your pizzas. This means that the best price of your pizza from a social point of view would be one cent (your marginal cost). Why not two cents? Because if someone was hungry but could only afford one cent and you sold them a pizza at that price, then the world as a whole would still be better off. The hungry person was fed and you covered the marginal cost of making the pizza.
Let’s recap: When your marginal cost was extremely high, you had very few customers. As your marginal cost dropped you started to be able to sell more. And as your marginal cost approached zero, you eventually started to feed the world! This is exactly where we are with digital technology. We can now feed the world with information. That additional YouTube video view? Marginal cost of zero. Additional access to Wikipedia? Marginal cost of zero. Additional traffic report delivered by Waze? Marginal cost of zero.
This means we should expect certain digital “pizza-making operations” to be huge and span the globe in near monopoly positions (i.e., they are much larger than anyone else, having nearly the entire market to themselves). This is exactly what we are seeing with companies such as Google and Facebook. But—and this is critical to the idea of the Knowledge Age—it also means, from a social perspective, that the price for marginal usage should be zero.
Why prevent someone from accessing YouTube, Wikipedia or Waze, either by cutting them off from the system altogether or charging a price they can’t afford? This would always constitute a loss to society. With zero marginal cost, any given individual might receive some benefit, which would be a benefit greater than the marginal cost. And best of all, they might use what they learn to create something that they share and that in turn winds up delivering extraordinary enjoyment or a scientific breakthrough to the world.
We are not used to zero marginal cost. Most of economics assumes non-zero marginal cost. You can think of zero marginal cost as an economic singularity: dividing by zero is undefined, and as you approach zero marginal cost, strange things happen. We are already observing these strange things in the world today, including digital near monopolies and a power law distribution of income and wealth. We are now rapidly approaching this zero marginal cost singularity in many industries, including finance and education.
So the first characteristic of digital technology that expands the space of the possible is zero marginal cost. This space includes digital monopolies, but it also includes access for all of humanity to all the world’s knowledge (a term I will define more precisely later).
Last Uncertainty Wednesday, I introduced the topic of spurious correlation. Since then I have discovered a site that gives some fantastic examples of (potentially) spurious correlations. Here is one:
The coefficient of correlation is 0.9926, i.e. almost 1 (which would be perfectly correlated).
Let’s remind ourselves what finding correlation in a sample of data means. It is simply a numerical measure that can be computed for any paired data. The formula produces a result that has nothing to with the labels on the data. This may seem like stating the obvious, but it is really important to keep in mind. The numerical result for correlation here is the same whether the labels read “Divorce rate in Maine” and “Per capita consumption of margarine” or if they were simply “Series 1″ and “Series 2.″
Why am I emphasizing this? Because whether or not we think sample correlation is indicative of real correlation is something we need to decide based on our explanations about the relationship between the two random variables. I don’t have an explanation relating “Per capita consumption of margarine” to the “Divorce rate in Maine.” Saying that I don’t have an explanation, importantly, though is not the same as saying that they are definitely independent (you may recall that independence is actually quite a strong assumption). Margarine consumption and divorce rates are both household behaviors and so it is quite possible for them to be dependent!
Next Uncertainty Wednesday we will take a deeper look into how much of a signal of real correlation we are getting depends on our prior believes (based on explanations) of the actual correlation.
NOTE: This is the current state of the introduction for World After Capital. It provides an overview of the key ideas from the book. In case you missed it, last week’s blog post has the work in progress note and the preface.
Humanity is unique, at least for now, in having developed knowledge. Knowledge in turn has enabled us to create increasingly powerful technology. The effect of technological advances is to broaden the “space of the possible.”
- With the Internet we can give everyone free access to education, but we can also share hate speech globally
- With artificial intelligence we can build self-driving cars, but we can also automate censorship and manipulation
A broader space of the possible contains both good and bad capabilities. There is nothing fundamentally new about this duality of technology.
- With fire we were able to warm ourselves and cook, but we were also able to burn down forests and enemy villages
- With steel we were able to construct more effective plows, but we were also able to forge more deadly swords
And yet there is something special about our moment in time.
We are experiencing a technological non-linearity, which renders many of the existing predictions about society based on extrapolation useless. The space of the possible for humanity is expanding rapidly due to the extraordinary power of digital technologies, which deliver universality of computation at zero marginal cost.
Humanity has encountered two similar non-linearities previously. The first was the invention of agriculture, which ended the Forager Age and brought us into the Agrarian Age . The second was the Enlightenment, which took us out of our state of ignorance about nature and helped usher in the Industrial Age .
Imagine foragers trying to predict what society would look like in the Agrarian Age. Cities, rulers and armies all would have come as a surprise. Similarly, much of what we have today—from modern medicine to computer technology—would look like magic to most people from as recently as the mid-1900s. Not just the existence of smartphones would have been hard to foresee, but even more so their widespread availability and affordability.
World After Capital has two goals. The first goal is to establish that we are, in fact, experiencing a third such non-linearity. The key argument is that each prior time the space of the possible expanded dramatically, the binding scarcity constraint for humanity shifted. Specifically, the invention of agriculture shifted scarcity from food to land. Industrialization, in turn, shifted scarcity from land to capital. Now digital technologies are shifting scarcity from capital to attention. Scarcity, here, refers to humanity’s ability to meet everyone’s basic needs.
Capital is already no longer scarce in some parts of the world and rapidly less scarce everywhere. We should consider this to be the great success of capitalism. But capitalism, in its present form, will not and can not solve the scarcity of attention. We are bad, individually and collectively, at allocating attention. For example, how much attention are you paying to your friends and family, or to the existential question of the meaning and purpose of your life? How much attention are we paying, as humanity, to the great challenges and opportunities of our time, such as climate change and space travel? Capitalism cannot address these attention allocation problems because prices do not, and cannot, exist for many of the activities that we should be paying attention to.
The second goal for World After Capital is to propose an approach for overcoming the limits of existing capitalism and facilitating a smooth transition from the Industrial Age (scarce capital) to the Knowledge Age (scarce attention). Getting this right is critical for humanity, as the two previous transitions were marked by massive turmoil and upheaval—including two World Wars to get from the Agrarian Age to the Industrial Age. Already, we are seeing signs of increasing conflict within societies and among belief systems across the world.
How should we enter this third transition? What actions should society take now, when—facing a non-linearity—we can’t make good predictions about the future?
We need to enact policies that allow for social and economic changes to occur gradually, instead of artificially suppressing these changes only to have them explode eventually. In particular, I will argue for smoothing the transition to the Knowledge Age by expanding three powerful individual freedoms.
- Economic freedom: instituting a basic income
- Informational freedom: investing in Internet access, rolling back intellectual property rights, and rethinking personal privacy
- Psychological freedom: practicing and encouraging self-regulation
Increasing these three freedoms will make attention less scarce. Economic freedom unlocks time currently spent in jobs that can and should be automated. Informational freedom broadens access to information and computation. Psychological freedom enables rationality in a world of information overload. Each of these freedoms is important by itself but they are also mutually reinforcing.
One crucial goal in reducing the scarcity of attention is to improve the functioning of the “Knowledge Loop.” The Knowledge Loop, which consists of learning, creating and sharing, is the source of all knowledge. Producing more knowledge is essential to human progress. The history of humanity is filled with prior civilizations that failed to produce the knowledge required to overcome the challenges they faced.
To achieve this goal through increased individual freedoms, we also need to firmly establish a set of values, including critical inquiry, democracy and responsibility. These values provide the social underpinning for the Knowledge Loop. They follow directly from a renewed Humanism, which in turn has an objective basis in the existence and power of human knowledge. Reasserting Humanism is especially critical at a time when we are standing at the threshold of creating transhumans, through genetic engineering and augmentation, as well as neohumans, in the form of artificial intelligence.
World After Capital argues for increased freedoms, rooted in humanism, as the way to transition from the Industrial Age to the Knowledge Age. I am profoundly optimistic about the ultimate potential for human progress. I am, however, pessimistic about how we will get there. We seem intent on clinging to the Industrial Age at all cost, increasing the likelihood of violent change. My hope, then, is that in writing World After Capital I can help in some small way to move us forward peacefully.
In the last Uncertainty Wednesday post on Sample Variance, I wrote that “Inference from data without explanations is how people go deeply wrong about reality.” It occurred to me that the best way to illustrate this is by writing about spurious correlation. To do that I first have to introduce the concept of correlation though. It may seem surprising that I have gotten this far into the series without doing so, but we spent a fair bit of time on a related concept, namely independence.
If you don’t recall, you should go back and read the posts on independence. The opposite of independence of two (or more) random variables is dependence. Now this is where it gets confusing. Sometimes the word “correlation” is used as a synonym for “dependence.” But more commonly “correlation” refers to a measure of a specific type of dependence, namely linear dependence.
The top row shows how the correlation coefficient ranges between +1 (perfect positive linear correlation) and -1 (perfect negative correlation) and decreases as the two random variables become less dependent. It becomes 0 in the middle when they are independent.
The second row deals with a common misconception: the correlation coefficient does not in fact measure the slope of the relationship. It just measures the strength. So different slopes but perfectly correlated results in a coefficient of +1 or -1.
The third row in turn shows that there can be very clear cases of dependence, which are immediately visually evident and yet correlation coefficient, as a measure of linear dependence, is 0.
All of this is to say that correlation, as commonly used, is a highly specific measure of dependence. And yet correlation turns out to be widely used. As we will see much of that is in fact abuse.
Now you might have heard the expression “correlation does not mean causation.” We will get to that also, but what we are after first is “correlation does not even mean correlation.”
Huh? What do I mean? Well, as you have seen from the posts on sample mean and sample variance, whenever you are dealing with a sample the observed values of statistical measure have their own distribution. The same is of course true for correlation. So two random variable may be completely independent, but when you draw a sample, the sample happens to have correlation. That is known as spurious correlation.
NOTE: One of my goals for 2018 is to bring my book World After Capital to the point where I feel it is good enough to publish in paper form. I have come to realize that if I just keep blogging about other topics at the same time I won’t get there. So going forward every Monday I am planning to post revised pages or whole sections of the book starting today with the sections titled “Work in Progress” and “Preface.”
Work in Progress
This book is a work in progress. What you are reading now is a draft with known problems and placeholders. It does, however, include all the major ideas and what remains is a process of gradual improvement.
The process of writing in this way is an example of what I call the “knowledge loop” in the book. The knowledge loop consists of learning, creating and sharing. My writing is based on what I have learned. By sharing early, others can learn from my ideas and I, in turn, can learn from their feedback.
I know how powerful this approach is from my experience with blogging for nearly a decade. I have learned a great deal from reader comments. The same will be true here.
As a Venture Capitalist (“VC”), I get asked the question a lot “What’s next?” People want to know what I think the next big technology will be. They are looking for an answer like “robotics” or “machine learning.” But that’s not the question that I am interested in answering. Instead, what I believe matters much more is what we as humanity decide to do with all the new technologies available to us.
In particular, I am convinced that we are in the middle of a transition that’s as profound as when we went from the Agrarian Age to the Industrial Age . This transition is being driven by the advent of digital technologies, and we must now collectively decide what comes after the Industrial Age. In World After Capital, I am arguing that the proper next age is the Knowledge Age—and that in order to get there we need to focus on the allocation of attention (rather than capital).
Why write a book as a VC? Or more pointed: isn’t this a distraction from finding and managing investments in startups? Working with startups gives me a window into the future. I get to see certain trends and developments before they become more widely understood. That puts me in a good position to write about the future. At the same time, there is a feedback loop with investing: Writing about the future that I would like to see, helps me find and invest in companies that can help bring that future about. I am writing World After Capital because I feel compelled to do so by what I see, but writing the book has also made me a better investor.
Why write this specific book now? A big transition means lots of uncertainty. Many people fear change and they start to support populists who tend to have a simple message: Go back to the past. This is happening all over the world. We saw it with Brexit and with the election of Donald Trump as president of the United States . I started writing World After Capital considerably before both of those events occurred, but they serve to underline the importance of a future-oriented narrative. Going back is not a viable option. It never has been. We did not remain foragers after inventing agriculture. We did not remain farmers having invented industrial machines. We will not remain laborers having invented digital technologies.
One of the messages in World After Capital is that we all need to have a purpose in life. As we leave the Industrial Age behind, our purpose can no longer be derived from having a job (or from consuming). Instead, we need to find a purpose that is compatible with a Knowledge Age. I feel incredibly fortunate to have found my purpose in investing in Knowledge Age startups, writing and speaking about why this transition is happening now, and suggesting how we might go about it.
I deliberately use the term Knowledge Age, instead of Information Age. We are drowning in information, which spews forth endlessly from our computers and phones. Knowledge, by contrast, are the scientific explanations and the works of art and literature that have withstood the test of time and have been refined through the process of critical inquiry. Knowledge is what makes human life possible and worthwhile.
In a strange and wonderful way, much of what I have done in the past has brought me to this point. As a teenager in my native Germany, I fell in love with computers early in the 1980s. I got to work, even before going to college, writing software for companies. I studied both economics and computer science as an undergraduate student at Harvard and wrote my senior thesis about the impact of computerized trading on stock prices. As a consultant, I saw the impact of information systems on the automotive, airline and electric utility industries. As a graduate student at MIT, I once again studied both economics and computer science and wrote my dissertation about the impact of information technology on the organization of companies. As an entrepreneur, I co-founded an early and ultimately unsuccessful Internet healthcare company. And finally as an investor, I have had the good fortune of being able to invest in companies that are providing transformative digital technologies and services, including Etsy, MongoDB and Twilio.
I am grateful for all the people who have helped me along the way: my parents who wholeheartedly supported my interest in computers at a time when it was quite unusual and expensive to do so; my wife Susan Danziger and our children Michael, Katie and Peter who made me a better person; my many teachers, including Erik Brynjolfsson and Bengt Holmström, from whom I learned so much; my partners at Union Square Ventures, starting with Fred Wilson and Brad Burnham who invited me to join the firm they had started; the many entrepreneurs I have had the opportunity to work with; the philosophers and scientists, such as David Deutsch, who have demonstrated the power of human knowledge; the friends who have been there through good and bad times; and the many people who have taken the time to comment, who have invited me to speak, who have contributed in ways small and large, with special mentions for Seth Schulman for work on an early draft, Basil Vetas for capable research assistance, and Max Roser for extensive data collection and visualization.
Late in 2016 I wrote a blog post titled “Voice Platforms: Open Alternative is an Opportunity.” At the time I learned about a project called Mycroft. This project has just launched its second version, the Mark II, on Kickstarter. If you want voice in your home but are weary of Amazon and Google (and the other tech giants), this looks like an interesting alternative. Here is their intro video, which is super funny in a self deprecating way:
I just went ahead and backed the project and look forward to trying out the Mark II at home. I don’t know how many users such a platform will need to be viable but we won’t know unless we try!
Last Uncertainty Wednesday, I used meteorite impact data to make the point that sample variance may be much smaller than actual variance. Following the post, I was asked a great question on Twitter: “Is there such thing as estimation error on sample variance?” The answer is yes. Just as we saw earlier that the sample mean has a distribution, so does the sample variance. If you have different samples, you will get different variances and those will form a distribution. We are thus faced with exactly the same inference question as we were with the sample mean. How do we go about using the sample variance to estimate the actual variance?
I will write a lot more about inference in the future, but for now suffice it to say: the biggest mistake being made (and it is being made all the time), is to mistake the sample mean/variance for the actual mean/variance. And today I will give more examples of real life situations where the sample variance is highly likely to grossly underestimate the actual variance.
The first example are natural disasters, such as floods or earthquakes. These are cause by physical processes in the earth and its atmosphere. Both of these contain ridiculous amounts of energy (with the energy in the atmosphere currently increasing rapidly due to climate change). As a result it is extremely unlikely that any past sample includes the maximal possible event. In fact, if the maximal possible event had occurred, we might not even be here to read and write about it. So whenever you look at disaster event data and variance analysis based upon them, it is safe to assume, that the sample variance underestimates the true variance.
The second example are economies and financial markets. Both are systems of human activity and with massive human interventions aimed (explicitly or implicitly) at keeping volatility low. For instance, in the economy we have governments and central banks engaging in anti-cyclical policies (at least that’s generally what they attempt), such as fiscal or monetary stimulus during a downturn. In financial markets, there are many trading strategies that have the effect of reducing volatility, such as trading assets against each other based on their historical correlations. Such as strategy will, at least temporarily, re-enforce those correlations, even when they are no longer warranted. So economic and financial markets data is another example where the sample variance will underestimate the true variance.
Now as it turns out, my language here isn’t entirely precise. What we are really dealing with in all of these examples, going back to my “suppressed volatility” posts, are situations in which the variance itself has a variance. Come again? Simply put: variance can be low at times and high at other times. Most sample periods will be of lower variance (volatility). Even if you include the higher variance occurrences as long as you average everything out your variance estimate will be too low. And as I argued above in the case of flood and earthquakes (also true for meteorites), even if you are going with the largest observed variances only, you will still be underestimating actual variance.
It appears that some people genetically need only 4 hours of sleep per night. Well, I am not one of them. I also don’t seem to be needing less sleep as I am getting older. Instead, I have come to appreciate how critically important sleep is for me. Looking back, many of my worst decisions, as well as my worst behavior, correlate with periods of insufficient sleep (or excessive jet lag for that matter).
The particular effects of not sleeping enough for me areL I become short tempered and easily irritable. That’s not good as an investor, colleague, friend, father and husband. I have spent a fair bit of time over the last couple of years working on my equanimity. I have made good progress on that generally, but take away my sleep and I will easily revert.
I fully realize that sleeping enough has become a privilege, as many people, especially in the United States, have to hold down multiple jobs and/or are subject to the whims of automated scheduling systems. Others feel that they can only be successful if they work crazy hours (if Michael Moritz’s Financial Times Op Ed was meant as a wake up call about competition, it struck all the wrong notes — and there is also a piece to be written about tech entitlement in the US but that would also be very different).
So bottom line: whatever you can do to get the sleep you need, do it. I used to reply to emails until I was done. Now I go to bed when I need to sleep, emails be damned. Collectively as we work on alternatives to the Industrial Age, let’s make sure we can all get enough sleep. The world will be a better and kinder place.
if we simply estimate the volatility of a process from the observed sample variance, we may be wildly underestimating potential future variance
This turns out to be true not just for cases of “suppressed volatility” but much more broadly. For any fat tailed distribution, the sample variance will underestimate the true variance. Mistaking the sample variance for the actual variance is the same error as mistaking the sample mean for the actual mean. The sample mean has a distribution and the sample variance has a distribution. Whether or not they are an unbiased estimator for the true values depends on the characteristics of the process.
The new data could help scientists better refine estimates of the distribution of the sizes of NEOs [Near Earth Objects] including larger ones that could pose a danger to Earth
That will only work well if we take into account that we know that over longer time periods there have been much more massive impacts although these are often millions of years apart. This is the hallmark of a fat tail distribution: rare large outlier events. Naively using a sample that does not include these large strikes would give us a dramatic under-estimate of the true danger for humanity.
Next week we will look more at what this means (including other examples) and what we can do about coming up with estimates in these situations.
We have the USV office closed today in honor of Martin Luther King Jr. day. In his book “Where Do We Go From Here: Chaos or Community” he wrote:
In addition to the absence of coordination and sufficiency, the [social] programs of the past all have another common failing — they are indirect. Each seeks to solve poverty by first solving something else.
I’m now convinced that the simplest approach will prove to be the most effective — the solution to poverty is to abolish it directly by a now widely discussed measure: the guaranteed income.
I strongly recommend reading a longer excerpt from the book, which shows just how visionary MLK was. He wrote the book in 1967, the year that I was born. Since then we have made extraordinary progress in the productive capacity of the economy. Put differently, we can afford a basic income more easily than ever before.
So on this MLK day in 2018, if you are looking for things to do, read up on basic income. Here are some places to get started:
And there is a section on basic income in my book World After Capital. If you want to watch a video instead, you can check out my GEL talk or Rutger’s TED talk.
After a short break of a few weeks, Uncertainty Wednesday is back! My last post had been the third part in a series on “Spurious Correlation” which ended with “Interestingly, this takes me to the edge of my own knowledge and so I have asked an expert in Bayesian estimation to help me. Stay tuned!” Well, that expert is Eric Novik from Generable. Generable is a company that uses Stan, the world’s leading Bayesian estimation too, to help companies make better decisions.
Eric took on my challenge of representing a prior belief about correlation and seeing how observed correlation in a small sample would change that belief. You can find Eric’s complete post, titled “Correlation or no correlation, that is the question” on the Generable blog. The post is quite technical and I will not reproduce it here. Instead, I want to show the key findings in the form of density charts that Eric kindly prepared for me.
Here is the first chart
On the right hand side it shows the prior probability distribution over different correlation values in a so-called violin plot. Relative to some of the pictures I have shown in the past this simply has the axes switched, so on the vertical axis you have the possible correlation values from -1.0 to +1.0 and on the horizontal axis you have the probabilities. Now the picture combines the prior and posterior distributions into one, so you have to imagine on the horizontal axis there is a 0 where the respective words are. The graph is then mirrored around the 0 probability axis to make a nice looking solid shape. The human eye and brain can compare those solid shapes more easily with each other.
What then do we see? Well the green colored prior distribution here has all possible correlation values from -1.0 to +1.0 with roughly the same probability. This corresponds to having no prior belief about a specific correlation being more likely than another correlation. As I pointed out, this is a much more relaxed assumption than what is often assumed instead, namely that correlation = 0, i.e. the variables are uncorrelated.
The blue dotted line shows the correlation of -0.38 that was observed in the specific sample. The red colored distribution is the posterior distribution over possible correlation values. With our very relaxed prior we now see that a lot more probability mass resides close to the observed correlation in the sample, but we also see that lots of other correlation values are still included in the distribution, including positive correlation values up to greater than +0.5.
Now the super cool thing about the approach that Eric took is that we can easily try a different prior (in his code his requires changing a single parameter). Here is a second example:
Before reading on, ask yourself if you can interpret this chart compared to the first chart. What is different about the prior distribution and how does that impact the posterior?
So the prior now has much more probability around correlation = 0. Meaning we belief that the variables are more likely to be uncorrelated, but we are not ruling out either extreme from -1.0 to +1.0 (you can see there is some probability mass on both ends). With this somewhat tighter prior, we find that the posterior moves a lot less! Much more mass remains above the sample correlation and the mean correlation in the posterior (the slightly darker red horizontal line) is about halfway between uncorrelated (0 correlation) and the observe -0.38 correlation from the sample.
What should you take away from all of this? Correlation, like mean, is just a single point statistic. As such it has a distribution of its own. Most people make the mistake of ignoring the existence of that distribution which results in all sorts of errors of inference. They do so either because they never really understood this, or maliciously in what has become known as “p-hacking.” In upcoming posts I will write about p-values and why they are so problematic.
It happened on the second to last day of a wonderful family trip to Southeast Asia. I looked at my mobile phone in the morning and it had an alert that said something like “SIM not recognized.” I should have probably figured out something was remiss right then, but instead I simply assumed that my phone had tried to register on an unsupported network. As I was sitting down for breakfast suddenly 3 emails arrived in rapid succession that made me realize I was being hacked (time order is from bottom to top):
Argh! Clearly someone had gotten my SMS messages to go to them instead and used it to hack my old yahoo email account. They quickly changed the password to lock me out and removed my alternate email.
From there I figured their next stop would be Twitter. That’s one of the few services where I used that email address. I ran back to my hotel room and tried to change my email address on my logged in Twitter account. Alas I was too late. The attacker already had reset the password and I was logged out.
The attacker then made a single tweet (as I later discovered also one rude reply) and pinned it:
Immediately people started remarking that this didn’t sound at all like me and that I had probably been hacked. Several people also texted me, but obviously those texts went to the attacker’s phone!
Thankfully the team at USV immediately jumped into action. They replied to the tweet and others who were quoting it that my account had been hacked. They helped me contact Twitter and have my account suspended and the tweet removed (which happened quite quickly but seemed like an eternity to me). In the meantime I got on the phone with T-Mobile to regain control of my phone.
I let T-Mobile know that someone had gotten into my account. They pretty quickly established that there had been a transfer of the SIM to a different SIM. I asked somewhat irately how that was possible given that I had a password on the account. I was told that someone had shown up at a T-Mobile store as me and presented a valid ID. I was able to convince the rep that this had not been me. Thankfully they could see that I was calling from Thailand and I was able to answer all the security questions and able to produce the number off the SIM card actually in my phone. From there it only took a few minutes to have the SIM switched back.
With the phone number once again in my control what remained was getting my Twitter and Yahoo accounts back. Thankfully I was able to get great connections to support at both companies and they got this done in record time.
What are the takeaways? First, my accounts that were protected with Google Authenticator were safe (the attacker did try to go after these but without success). Second, someone went to fairly great lengths to get the SIM on my phone switched. This is all the more surprising given the fairly obvious tweet they sent.
So: SMS based 2FA is vulnerable (which is well known) if someone either ports your number outright or, more likely, can switch your SIM. I am pretty sure that T-Mobile will not switch my SIM again. Nonetheless, wherever possible I will now make sure to use a different second factor.
NOTE: Today I am continuing with publishing revisions to my book World After Capital. This is the second half of the chapter on the incredibly properties of digital technology. If you missed it, the first half of the chapter is on zero marginal cost.Universality of Computation
Zero marginal cost is only the first property of digital technology that dramatically expands the space of the possible. The second property is in some ways even more amazing.
Computers are universal machines. I mean this in a rather precise sense: anything that can be computed in the universe at all can be computed by the kind of machine that we already have, given enough memory and enough time. We have known this since the groundbreaking work by Alan Turing on computation. Turing invented an abstract computer, which we now call a Turing machine . He then came up with an ingenious proof to show that this machine, which turns out to be extremely simple, can compute anything .
What do I mean here by computation? I mean any process that takes some information inputs, executes a series of processing steps and produces an information output. That is—for better or worse—all that a human brain does either. The brain receives inputs via nerves, carries out some internal processing, and produces outputs (also via nerves). In principle, there is nothing a human brain can do that a digital machine cannot do.
The “in principle” limitation will turn out to be significant only if quantum effects matter in the brain. This is a hotly debated topic [NEED REFERENCE]. Quantum effects do not change what can be computed per se, because even a Turing machine can simulate a quantum effect, but it would take an impractically long time to do so, potentially millions of years or more . If quantum effects were to matter in the brain then we would need to wait for further progress in quantum computing to simulate a brain. Personally, I believe that quantum effects are unlikely to matter and that we will be able to simulate an entire human brain in a digital computer with sufficient detail. We can’t do it quite yet, as our present digital hardware is too slow and has insufficient memory (we also do not yet have a complete map of a human brain).
Unless you want to believe in something beyond what physics has determined to date, there is nothing that a human brain can do that a computer cannot do also. Likely a digital computer will suffice, but it is possible that we have to get to quantum computers to cover everything. Now there is always some wiggle room in the future. We may discover something new about physical reality that we don’t yet know, and that changes our view of what is computable. But not so far.
For a long time this universality property didn’t seem to matter all that much. Computers were pretty dumb compared to humans. This was frustrating to computer scientists who, going back as far as Turing himself, had the belief that it should be possible to build a machine that does, well, smart things. But they couldn’t get it to work. Even something that is really simple for most humans, such as recognizing objects, had computers completely stumped. Until now that is, when we suddenly find ourselves with computers that can do all sorts of smart things.
An analogy here is heavier than air flight. We knew for a long time that it must be possible—we knew that birds were heavier than air and yet they could fly. But it took until 1903, when the Wright Brothers built the first successful airplane, for us to figure out how to do it . Once they and several others around the same time had figured it out, though, progress was rapid. We went from not knowing how to fly for thousands of years to passenger jet planes crossing the Atlantic in 55 years (BOAC’s first transatlantic jet passenger flight was in 1958 ). If you graph this, you see a perfect example of a non-linearity. We didn’t get gradually better at flying. We couldn’t do it at all and then suddenly we did, and quickly did it very well.
Similarly, with digital technology, we have finally made a series of breakthroughs, which have taken us from essentially no machine intelligence to machines outperforming humans on many different tasks, including reading handwriting and recognizing faces . More impressive, maybe, is that machines have learned how to drive cars. The rate of progress in driving is a great example of the non-linearity of improvement. DARPA, the Defense Advanced Research Projects Agency, held its first so-called Grand Challenge for self driving cars in 2004. At the time they picked a 150 mile closed course in the Mojave Desert region, and yet no car got further than 7 miles before getting stuck (less than 5% of the course). By 2012, less than a decade later, Google’s self-driving cars had successfully driven over 300,000 miles on public roads with traffic .
Some people will object that reading handwriting, recognizing faces, or driving a car is not what we mean by intelligence. This just points out, though, that we don’t really have a good definition of “intelligence.” For instance, if you had a dog that could perform any of these tasks, let alone all three, you would likely call it an “intelligent” dog.
Other people will say that humans also have creativity and these machines, even if we grant them some form of intelligence, won’t ever be creative. This amounts to arguing that creativity is something other than computation. The word “creativity” suggests the idea of “something from nothing,” of outputs without inputs. But that is not the nature of human creativity: musicians create new music after having heard lots of music, engineers create new machines after having seen many existing ones, and so on. There is no evidence that creativity is more than computation.
Recently, Google achieved a relevant breakthrough in machine intelligence. The AlphaGo program beat Korean Go grandmaster Lee Sedol 4-1 . Previously, progress with software that could play Go had been comparatively slow and even the best programs could not beat strong club players, let alone masters. The search space in Go is extremely large, which means a search approach, which works for Chess, cannot be used to find moves. Instead, candidate moves need to be conjectured. Put differently, playing Go involves creativity.
The approach used to train the AlphaGo program, so-called adversarial training of neural networks, can also be applied to other domains that require creativity. There is already progress in applying these techniques to composing music and creating designs. Maybe even more surprisingly, machines can learn to be creative not just from studying prior human games or designs, but from creating their own based on rules. A newer version of AlphaGo called AlphaZero, starts out just knowing the rules of a game such as Go or chess, and learns from games it plays against itself [NEED REFERENCE]. This approach allows machines to be creative in areas that have limited or no prior human work to go on.
With digital technologies, the space of the possible has thus expanded to include machines that can most likely do anything that a human can do.Universality at Zero Marginal Cost
Now, impressive as these two properties of zero marginal cost and universality are on their own, their combination is truly magical. I will just give one example: we are well on our way to a computer program that will be able to diagnose any disease from a patient’s symptoms in a series of steps, including ordering new tests and interpreting their results . We have expected this based on universality, but now we are making tangible progress and accomplishing this is a matter of decades at best. Once we can do it, then thanks to zero marginal cost we can, and should, provide free diagnosis to anyone, anywhere in the world. (Okay—the actual lab tests, to the extent they are required, will still cost something). Still, one needs to let that sink in slowly to really grasp its extent. The realm of possibility for mankind will soon include free medical diagnosis for all humans.
Universality of computation at zero marginal cost is unlike anything we have had with prior technologies. Being able to give all of humanity access to all the world’s information and knowledge was never before possible. Intelligent machines were not previously possible. Now we have both. This is as profound an increase in what is possible for humanity as agriculture and industry were before. Each of those ushered in an entirely different age.
To help us think better about the next age made possible by digital technologies, we now need to put some foundations in place.
So I am still away on family vacation and following a self-imposed online diet, but even then it has been impossible to ignore the monster sized vulnerabilities disclosed today known as Meltdown and Spectre. And just to make sure nobody misreads my post title, these are bad. Downright ugly. They are pervasive, exploitable and a real longterm fix will likely require new hardware (one or more extra hardware bits in the CPU). So how can I possibly claim they are good? Here are four different ways I think these vulnerabilities can give an important boost to innovation.
1. Faster Adoption of (Real) Cloud Computing
One might think that Meltdown and Spectre are terrible for cloud computing as they break through all memory isolation, so that an attacker can see everything that’s in memory on a physical machine (across all the virtual machines). But I believe the opposite will be true if you think of cloud computing as true utility computing for small code chunks as in AWS Lambda, Google Cloud Functions or MongoDB Stitch. This to me has always been the true promise of the cloud, not having to muck with virtual machines or installing libraries. I just want to pass some code to the cloud and have that run. Interesting historic tidbit. The term “utility computing” goes back to a speech given by John McCarthy in, wait for it, 1961. Well, we now finally have it and, properly implemented, it will be secure.
2. Improved Secure Code Execution in the Web Browser
3. More Secure Operating Systems
Not just the web browser, but operating systems as a whole will benefit from a renewed focus on security. Efforts such as Qubes and Copperhead, while – as far as I know – not immune from the current vulnerabilities, deserve more attention and funding. It may also be time for completely new approaches, although I would prefer something a little less abstruse than Urbit.
4. New Machine Architectures
The fundamentally enabling elements behind both Meltdown and Spectre are the extraordinary steps we have taken to optimize for speed in the Von Neumann / Harvard architecture. Out of order execution and speculative execution increase utilization but turn out to give code access to memory that it shouldn’t be able to reach. Caches speed up memory access but also allow for side channels. Now we can make software and hardware changes to prevent this, but nonetheless one way to read these vulnerabilities is that we should stop pushing the existing architectures and work more on alternatives. This has of course to a degree already happened with the rise of GPUs and some neuromorphic hardware. And there has been a lot of recent investment into quantum computing. But there are lots of other interesting possibilities out there, such as Memristor based machines.
All in all then while Meltdown and Spectre are a huge bother in the short term, I believe that they will turn out to be good for computing innovation.
PS I highly recommend at least skimming the papers on Meltdown and Spectre, which are very accessible to anyone with a basic understanding of computer architecture.
On Friday afternoon I met with a class of architecture students who are researching the relationship between architecture and basic income. This strikes me as a super important issue to work on. One of my contentions in World After Capital is that with basic income the current trend towards everyone living in cities will be broken. As we are starting to need less land for agriculture (through improved yields and vertical farming), land outside of cities can be quite affordable. The trick will be to have housing that is also affordable.
One of the big goals of basic income is to increase individual freedom, so the idea here is not for the government to construct homes, but rather for people to either build and buy their own in the places they want to live. Part of this can be renovation of previously abandoned villages, but a significant part of this will likely be new construction. The challenge for architecture then would seem to be to create plans and sample buildings that can be shared widely and reproduced without royalties.
I have done some basic math and if the providers of capital are looking for low single digit percentage returns (eg in Europe there is capital that is looking for 2% annually), then you can afford to build quite a bit even if basic income is in the $800-1000 range per month. I am new to this particular aspect but excited to learn more. So I am looking for interesting projects around affordable housing, ideally with open source plans and support for off-grid or micro-grid implementation.
If you are aware of, or involved with, such a project, please let me know.
As 2017 draws to a close I will be taking a break from blogging, twitter and (for the most part) email until well into January. I have a great set of book recommendations – far too many to get through for quite some time – and I will be traveling with Susan and our children. Given all the craziness of this year in politics, climate change, technology (crypto currencies) and more, I look forward to getting some distance from it all and spending time with family. I feel incredibly fortunate to be able to do this and wish everyone all the best for 2018.
During the initial growth of the Internet, the United States enacted a variety of Safe Harbors, such as the DMCA, that allowed new innovation to unfold. We now need a similar approach to crypto tokens in order to help build the blockchain infrastructure.
Blockchains allow the creation of data stores and applications that have network effects but are not controlled by a central player. This helps address the key issue of centralization and monopoly like positions currently observed in the digital economy, with companies such as Facebook, Google and Amazon.
Crypto tokens play a central role in the consensus protocols which keep all the participants agreeing on the state of the system. They do so by providing incentives to expend the necessary compute and network resources required by the protocol.
What kinds of Safe Harbor are required to allow this innovation to flourish?
1. Companies and Projects Must be Able to Issue Tokens and Distribute them Widely
Developing advanced blockchain systems requires serious engineering work, which has to be financed. The ultimate value of these systems does not reside in the companies that start them because they are open protocols. That means teams and investors must be rewarded in tokens instead. Furthermore tokens need to be distributed widely to build an actually decentralized system.
A Safe Harbor for initial token sales and issuance could impose requirements on investors that are along the lines of Rule 144 governing Restricted Securities, such as imposing holding periods and information requirements prior to sales. Token distributions via faucets or crowd sales with small dollar amounts per buyer should be exempted from this requirement.
2. Tokens Must be Easy to Buy and Sell on an Ongoing Basis
Entities that supply compute and network resources to maintain a blockchain earn tokens. Those who want to use the service need to spend tokens. Since the supply and demand are separate from each other an efficient mechanism for buying and selling tokens is required. These cannot be securities with all the complications that would entail (imagine for a moment if concert tickets were securities).
A Safe Harbor for secondary token markets could impose utility token requirements. That means the tokens that qualify for this safe harbor cannot represent ownership interests in an entity (which would clearly make them securities).
The SEC and CFTC have indicated that they are looking to regulate ICOs and the trading of crypto currencies. I believe that a Safe Harbor approach along the lines described above would be best for fostering this innovation, while also protecting investors.
In 2015 I wrote a blog post titled “Uber’s Greatest Trick Revealed” in which I argued that Uber’s success was the result of providing a transportation service. This was and is exactly what consumers wanted but is not a neutral platform or marketplace. It would appear that regulators have finally caught up with this with the European Union’s top court ruling yesterday that Uber is in fact a transportation service.
Now we will hopefully enter a new phase in which regulators figure out how to get consumers the benefits of on demand, app based dispatch, including the massive expansion of capacity, but still deal with issues such as safety, congestion, driver’s rights, etc. And yes some local regulators were and are captive to incumbent taxi companies but that doesn’t mean there are not enlightened ones to be found who will come up with the right rules that can then over time be emulated everywhere.
This has been the pattern of regulation for lots of innovation. Early in the history of cars, for instance, there were red flag laws aimed at preventing cars from going faster than horse drawn carriages. Ultimately though cars did not succeed against regulation but because of regulation. We only got to the benefits of individual transportation by having rules of the road and through government investment in roads.
The same will be true for autonomous vehicles and on demand hailing services. They will ultimately be successful *because* of regulation.