A tech evangelist with soul

A review of “Exponential: How Accelerating Technology Is Leaving Us Behind and What to Do About It” by Azeem Azhar

Azeem Azhar’s “Exponential” starts out like many a techno-boosting tome: a smart, well-informed and optimistic account of current trends in and impacts of technology, a thesis about how we could better track this progress by using Wright’s Law than the more commonly used Moore’s Law, and a catchy concept - “that we’re living in the Exponential Age” -  to hang it all off, perhaps in the hope of coining a term that will be deemed to have been definitive by posterity.

As the book progresses, however, it soon becomes apparent that there is more to Azhar than this first impression suggests. As consumers of his excellent weekly “Exponential View” newsletter (and its accompanying podcast) will know, Azhar is not your average tech evangelist. While he has extensive hands-on experience as both tech journalist and tech entrepreneur - and thus a deep command of his subject - he also has a deep concern for and interest in the real impact that all this tech is having on society. Democratic society, in particular.

He’s not afraid to say it, either. As early as Chapter 1 we find him castigating digital engineers for their culturally dominant view that “technology is neutral”. “Technologies are not just neutral tools to be applied (or misapplied) by their users,” Azhar insists. “They are artefacts built by people. And these people direct and design their inventions according to their own preferences [...] And that means that our technologies often recreate the systems of power the exist in the rest of society.” After several decades’ worth of Silicon Valley hubris this is a breath of fresh air.

Azhar proceeds to walk a neat line between indulging his admiration for and excitement about the incredible gains that technology is bringing and his awareness of its potential downsides and impact on the structures of power. He allows himself to be enthralled by the idea that “between these four key areas - computing, energy, biology and manufacturing - it is possible to make out the contours of a wholly new era of human society” while remaining continually alert to the very real possibility that such a new era could, if we are not proactive and careful, very easily turn out to be a place in which we wouldn’t want to live. 

As the book goes on Azhar becomes increasingly critical of the monopolist behaviour of the current tech giants, the manner in which the homophily encouraged by social media is atomising our culture, the fact that AI systems can just as easily embody society's prejudices and offer new and insidious means of social control as they can transform business and science, and the way in which drone warfare is driving extreme asymmetries in military conflict.

All these points are made well and loudly elsewhere, for sure. But what’s so refreshing about “Exponential” is that Azhar makes them without losing his enthusiasm for the positive transformations that technology promises, and he uses them to develop a continued and determined case for governments and other social organisations to evolve policies and mechanisms of governance that are equal to the task that this new technological era presents. He even advances a case for more and more effective collective action by workers, a view that is guaranteed to raise hackles at Amazon, Google and Tesla, the point being that if we don’t haul our political structures out of the twentieth century and reform them to be equal to the challenge of controlling technology, it’s a pretty foregone conclusion already that technology and its overlords are going to control us.

Whether or not the era in which this battle rages will end up being known as “the exponential age” remains to be seen. What’s certain is that we need more writers and entrepreneurs of Azhar’s calibre around if we’re all going to share in its benefits.

A quick sketch of consciousness

Last weekend I was flicking through some TWIML AI podcasts, something I haven’t done since the Covid-19 pandemic kicked and I got busy with  other things. Spoiled for choice as ever, I opted to listen to Sam Charrington’s  interview with Yoshua Bengio.

It doesn’t need me to tell you that Bengio is something of a legend, one of the world’s leading AI researchers; but what’s great about this interview is the succinctness with which Bengio brings together some of the major strands in contemporary thought about consciousness and the mind in a way that I have been trying and failing to do myself, or at least failing to do with any kind of clarity that I was able to communicate.

The strands I’m talking about are those characterised by the work of Geoffrey Hinton (deep learning), Daniel Kahneman (System 1 & System 2 thinking), and Judea Pearl (causal inference), .

I’m not going to give a potted summary of the work in question; the principles of deep learning as developed by Hinton and colleagues are well known enough now; Kahneman’s work with Andreas Tversky won him the Nobel prize in Economics and his book Thinking Fast & Slow was an international bestseller, and I wrote about Pearl’s work in this blog  last year.

What’s so cool about the TWIML interview with Bengio is that after listening to it I was suddenly felt able to reconcile all these insights into different mental mechanisms into what at least seems like a coherent whole, bound together by Bengio’s insights about working memory and attention, which I already knew was something that he and others (Hinton too, I think) have been working on.

I sat right down and sketched out this diagram to explain it to myself, and I thought I’d share it hear and ask for comments:

My quick sketch of consiousness, pace Yoshua Bengio, Geoffrey Hinton, Judea Pearl & Daniel Kahneman

My quick sketch of consiousness, pace Yoshua Bengio, Geoffrey Hinton, Judea Pearl & Daniel Kahneman

Now doubt some of this confuses more than it simplifies, but once I’ve had more time to consider it and had some feedback I’ll see if I can improve it and maybe write something a little more detailed about what I’m trying to get it here, in case it isn’t actually as obvious as I’m intending it to be!Now doubt some of this confuses more than it simplifies, but once I’ve had more time to consider it and had some feedback I’ll see if I can improve it and maybe write something a little more detailed about what I’m trying to get it here, in case it isn’t actually as obvious as I’m intending it to be!

Why AI is not AI until it wonders why

Reflections on Judea Pearl’s science of causal reasoning

Dr. Mark Freestone lecturing on the Alan Turing stage at CogX19.

Dr. Mark Freestone lecturing on the Alan Turing stage at CogX19.

Back in June I was lucky enough to attend (indeed, exhibit at) this year’s  CogX Festival of AI and Emerging Technology  in London. It’s a fantastic event stuffed full of fascinating presentations— I urge you to come next year if you can — but of all the great talks I saw and encounters I had, one in particular stood out enough to make me want to sit down and write a blog post about it.

The presentation in question was by  Dr. Mark Freestone. Freestone is a Senior Lecturer in the Centre for Psychiatry at the Wolfson Institute for Preventive Medicine; he’s also an Alan Turing fellow. What he had to say chimed powerfully with the contents of a book I’d read just a few weeks before, and which for my money sits alongside  Daniel Kahneman’s “Thinking, Fast and Slow”  as one of the books of the decade.

This was  Judea Pearl’s “The Book of Why”  (co-written with Dana Mackenzie), which is about cause and effect in statistics. As a student of philosophy and psychology and a part-time data scientist I’ve spent a fair chunk of my intellectual life pondering these things, which is why I picked it up and read it in the first place. And what a revelation it proved to be.

A historical timeline info board about the importance of Judah Pearl and Bayesian networks in the history of AI, from the “AI: More than Human” exhibition at London’s Barbican Centre, August 2019

A historical timeline info board about the importance of Judah Pearl and Bayesian networks in the history of AI, from the “AI: More than Human” exhibition at London’s Barbican Centre, August 2019

Statistics tells us an enormous amount about the world, and now — thanks to analytical techniques of various flavours (from logistical regression to neural nets)— we’re baking statistical analysis at scale into the extraordinary data structures we’ve been building since the invention of the micro-processor and, more recently, the Internet.

We’ve now, with our usual hubris (and usual slavish adherence to the dictates of marketing), decided to call this development  artificial intelligence, despite the fact that it’s not really intelligence at all but is, rather, pattern recognition and statistical analysis.

I don’t mean to demean the achievements that have been made in these sectors. But when compared to the processes at work in human or animal brains, they are akin to those at the “automated function” end — object persistence or facial recognition in vision, for example. Closer, therefore, to sensory perception than to the abstract cortical processing and decision-making that we generally refer to as “intelligence” (unless you work in marketing).

As every statistician knows, you see, “correlation is not causation.” But as Pearl points out in “The Book of Why”:

Unfortunately, statistics has fetishized this commonsense observation. It tells us that correlation is not causation, but it does not tell us what causation is…. Student [of statistics] are not allowed to say that X is the cause of Y — only that X & Y are “related” or “associated”.
— Judea Pearl, "The Book of Why"

Even more than that, statisticians have maintained for decades that correlation was enough, that causation was either unfathomable or not required. So perhaps it’s no surprise that, as the crowning achievement of the discipline that has produced it, correlation is what the current crop of AI technology does, and does very well indeed.

This is fine if we want an  AI  to tell an image of a dog from an image of a cat; to recognise a face or a voice or a word or a cancer cell in the midst of healthy tissue; to calculate routes and identify cars and pedestrians; even to work at how to win at video games. Iterated pattern recognition of labelled data with backward propagation for error correction bolstered by a range of other techniques to simulate the contributions of human memory or the layering function of the mammalian visual cortex can handle all of this admirably. If you ally the techniques to more traditional AI techniques like decision trees and other higher order logics, you can start beating grandmasters at chess or Go and start building (or attempting to build) self-driving cars.

The trouble starts when we want to ask why something happened or predict what  might happen if  in systems as unconfined and messy as the untrammelled physical world rather than in closed and rule-bound environments such as a Go board or the neatly laid out traffic grid of the average mid-Western US town. Observational data sets of the kind used to train neural networks in pattern recognition do not contain the answers to these kinds of questions, you see (q.v. “correlation is not causation”). When it comes to causal or predictive questions (predictive in the sense of predicting the future, rather than predicting the likelihood of a classification), “data are profoundly dumb”.

In other words, in the realm of actual thinking, rather than the processing that our visual cortices perform on the patterns of light that play across our retinae, these processes do not replicate what is going on in our heads. We do not use correlation to work out what might happen next. It’s part of the toolkit we might deploy, but it isn’t by any stretch the core mechanic of how we think.

When it comes to figuring out causation, we instead use scenarios and counterfactuals. We use fictions, not facts (a point that appeals to the  novelist in me, as you might well guess). These fictions have their basis in fact (well, most of the time), but even so they are built on relatively few immediate data points. They are instead largely constructed from multiple reconstituted examples from our experience — what we call “common sense”. They also inherently probabilistic — something they have in common with correlation. What they don’t share with correlation, however, is the ability “to predict the effects of an intervention without actually enacting it,” as Pearl puts it in his book.

Well, I hear you say, doesn’t AlphaGo do exactly that? And the answer is, no, it does not. AlphaGo enacts millions of virtual scenarios along multiple forking paths of action to produce highly complex statistical analyses of possible outcomes, which are then encoded into the weights of its deep neural nets. This is incredibly effective and even capable of producing previously unappreciated insight into Go’s game mechanic (AlphaGo’s now famous move 37 and, subsequently, Sedol’s move 78). And it may even be, in the broadest sense, akin to what Lee Sedol himself is doing when he’s playing Go. But it’s not what Lee Sedol is doing when he’s trying to work out what he should buy his daughter for her birthday.

When Lee Sedol does that, he is spinning up various counterfactual scenarios involving various versions of his daughter and himself, various gift options, and a whole range of family scenarios possibly stretching well into the future, scenarios that “reflect the very structure of [his] world model.” None of these scenarios will have happened in the past, and none of them will happen in the future, but he will make a choice dependent on whichever of them conforms most closely with his world model. And then, when he sees his daughter’s (and his wife’s) reaction to the gift, he’ll perhaps embellish his world model according to the difference between his prediction and the perceived reality, thus deploying a training set, not of AlphaGo’s millions of examples, but of just one.

What’s strange about this is that the empirical observation can never fully confirm or refute the counterfactual. And yet counterfactuals are the primary tools we have for guiding our journey through the world in a cybernetic fashion, and thus are “the building blocks of moral behaviour as well as scientific thought.”

Current  AI  does not benefit from this mode of human thought. The importance of Pearl’s work, as encapsulated in “The Book of Why”, is that over the last three decades he has developed a method, a “causal calculus”, to enable the “algorithmization of counterfactuals”, and thus make them available to use by thinking machines.

What is causal calculus? In essence, it’s a way of modelling the probability (P) of an event (L) happening if an action (X) takes place, while taking into account both mediating variables (so enabling the calculus to model of indirect as well as direct relationships between action and outcome) and influencing variables (so enabling the calculus to quantify and/or isolate other factors that may confuse, complexify or obscure the key relationship being interrogated much as the paragon of this form, the randomised control trial, seeks to do).

Pearl and his collaborators have developed a visual vernacular for mapping the causal relationships between these various elements for any given situation. These causal graphs in turn allows the construction of counterfactuals: how the mapped causal calculus if an influencing variable impinged on this node instead of that node, or if a mediating relationship should turn out to be reciprocal instead of just one way, for example. Once the pathways have been mapped, it then becomes possible to take data generated in one scenario and test its validity or plausibility in another, apparently comparable, scenario.

This is much more than a Bayesian prior, though priors play an important role in estimating the initial conditions for any given do-calculus, as Pearl terms his graphs. But the do-calculus itself goes far beyond Bayesian techniques in its power and implications as it compartmentalises and tracks conditions that comprise the system under study, rather than taken a global probability snapshot at a given stage and feeding it back into the evolving prediction calculation. (For a nice summary of the technique — and a second recommendation of the book — check out  this Medium post  by data scientist Ken Tsui).

As you’d expect, in “The Book of Why” Pearl gives plenty of good toy examples of the do-calculus; what’s particularly interesting about these is the way that even a very simple causal graphs with only five or six nodes can help unpick incredibly thorny issues like the demonstration of causal relationship between smoking and lung cancer, or the comparative impacts of nature and nuture on personality.

It’s in these test cases, too, that we are able to see the profound impact of this approach on the entire discipline of statistics. It means no less than:

the mantra “Correlation does not imply causation” should give way to “Some correlations do imply causation.”
— Judea Pearl, "The Book of Why"

The do-calculus is the technique that allows us — or our computers — to interrogate situations and their counterfactuals to work out which correlations those are.

Dr. Freestone continues his lecture at CogX19, with an example of a full causal graph.

Dr. Freestone continues his lecture at CogX19, with an example of a full causal graph.

So now, I hope, it should be apparent why Mark Freestone’s talk at CogX 2019 excited me so much. It was the first example I’d come across since reading Pearl’s book of someone applying the do-calculus in the wild. As you can see from the photograph above, the causal graph of actions, outcomes, influences and mediators gets pretty crazy pretty fast when you’re trying to understand cause and effect in a situation as complex as the development of risk models for prediction and management of violence in mental health services (the focus of Freestone’s study).

The approach is also already beginning to make an impact on robotics. Start-up  Realtime Robotics  is making great progress on enabling interactive movement in machines by using  counterfactual causal models, creating a specialised processor and scripting language (Indigolog) specifically to enable it. DeepMind has been mucking about in this area too, as you’d expect. Check out some of their research findings  here.

I don’t pretend to be an expert in the do-calculus by any stretch. I’m writing this post partly to celebrate Pearl’s work, partly to tell you that it’s worthy of your attention if you haven’t come across it before, and partly to help me explain it to myself. To really grasp it I need to reread the whole book then start working through some trial examples; if I manage to get round to this while trying to close  Hospify’s seed round (which is keeping me pretty busy), I’ll let you know.

In the meantime, do go and read “The Book of Why” for yourself. I promise you it’s worth it.