Intelligent listening for beginners

,


Andy Cormack takes a less than deferential look at the sweeping powers that the UK government is considering in relation to the ongoing war against terrorism and other threats on the home front. 

In this article, he raises some very valid points about the way that these proposals would affect everyone using the internet in the UK, looks at the implications of such legislation on the internet and to a greater extent, our personal freedoms.

surveillance-2616771_1920

Theresa May, the UK’s Prime Minister, known well for her laughably dated views on how to combat terrorist threats online is at it again, this time aiming her sights directly at many of the internet giants of today: Google, Facebook, Twitter, Microsoft, and many more.

May had arranged a meeting with representatives from these companies, taking place in New York at around the same time as her meeting with the UN General Assembly, to discuss the current state of the internet and urged them to “develop new technological solutions to prevent such content being uploaded in the first place.” She was of course referencing examples of messages by terrorist groups such as ISIS which remain online for “too long” before being taken down.

As mentioned previously this is far from May’s first attempts at bringing the internet under political control. A statement made just a few months prior signalled her intentions, in no uncertain terms, to regulate the internet under governmental control, stating: “Some people say that it is not for government to regulate when it comes to technology and the internet; we disagree”. This comes hot on the heels of the controversial   Investigatory Powers Act, inexplicably passed through parliament and made law at the end of 2016.

The Investigatory Powers Act is already in some people’s view a power play that violates citizen’s privacy in a very disturbing number of ways, with internet service providers mandated to retain browsing records for every customer, and to allow access to authorities to read on a whim. This list of authorised government departments far exceeds just GCHQ and the Police as well, including the Food Standards Agency and the Department for Work and Pensions for starters.

At this point, the core parts of this new act are in place and enforced, including the gathering and retention of internet data of citizens, as well as new legal mechanisms that can force companies to hand over that data on customers to the intelligence agencies.

Now that May is taking a different approach that could be perceived by many that further impede any kind of freedom of speech on the internet that the government even remotely disagrees with, would we be surprised if May hand her government have underestimated the sheer volume and complexity of such tasks? Her public facing motives at least appear to be building on people’s fears and insecurities with regards to potential terrorist attacks and their planning, consistently trotting out the same old thing about leaving “no safe space for terrorists to be able to communicate online.”

Major technology and internet companies are now actively encouraged to build backdoors into their encrypted messaging services in order to allow this and successive governments access for the purposes of ‘public safety’, and contradictory to this, actually opening the potential paths for other attack vectors by hackers into an already popular target arena for such cyber criminals.

To explain why such a thing in itself as an incredibly bad idea, you only have to look to the push back and resistance from all sorts of companies and security experts over these kinds of mandates. You don’t have to look very far to come across a plethora of articles on the subject, most of which are written by experts in the field of cyber security, or at least referencing them in their articles.

This actually dates back further than Theresa May’s administration. Under David Cameron’s leadership back in 2015, it was proposed that Britain would basically ban encryption, a move that was even more laughable than the attempts May is making now. It’s tantamount to removing the fences around chicken coops and putting cameras in their place, all under the pretext of “protecting the chickens from the wolves”, resulting in the exact opposite; you just get to maybe see the wolves trotting off with their prey slightly better while protecting nothing and in fact making it even easier for them. Again, all of this coming from people who have absolutely no clue about the real implications behind their actions and propositions, and clearly not taking into account the advice of those that do have half a clue as they wouldn’t have even opened their mouths publicly after getting laughed out of the room by security experts. One such example of experts explaining just a few of the multitude of reasons that backdoors are a bad idea was made into a video on Tech Republic, which is worth watching.

Drifting back onto the current topic at hand, after some of the backstory about the prolonged attempts at weakening the privacy of the general public, and the security of the companies who provide services to them, this latest ask seems perhaps a little more grounded and reasonable. That being said, the sophistication of software required to do even half the job May is asking would be significantly complicated at best, and even for tasks with this level of complexity, all it would do probably is end up finding a greater amount of false positives than producing bona fide results.

Illustrating the difference between the perceivably easy and the incredibly difficult are difficult to explain to those that do not understand, as is the situation here, even if the companies tendering for the job were inclined to spend an incredible amount of time and resources into the direction of meeting May’s plans.

On top of this, she plans by going a step further, not just requesting that these internet giants work in ways to prevent terrorists from posting things online that are potential security threats to ours and other countries. May intends to fundamentally turn social media into a filter between what people post online, and what actually gets shown to the rest of the world.

This manifesto references previous laws put into place by the Investigatory Powers Act, including new laws that would impede both searching for and the accessing of pornographic material online, stating “We will put a responsibility on industry not to direct users – even unintentionally – to hate speech, pornography, or other sources of harm”, in itself,  implying that the government regards pornography as a vicious and oppressive industry on the same level as terrorism. What’s next? Are they going to dictate which breakfast cereals we consume every day? Or whether nudity on TV shows like Game of Thrones should be banned for this same ‘crime’?

This kind of hysteria is not just centred around the UK either. Germany recently passing a law that would fine major social media companies up to €50 million Euros if they failed to remove ‘criminal content’ within 24 hours of it being posted on their services. And this latest push by May is not coming just from the UK, but also from France and Italy as well, striving to bring those numbers down from the already laughable 24 hours to a quite literally impossible one to two hours after that content is posted.

To put some of this into some kind of perspective, consider the sheer volume of messages that are posted on these sites per day. Here’s an example: take a look at the Internet Live Stats page for Twitter, a page dedicated to statistics regarding Twitter and its usage on a daily basis (http://www.internetlivestats.com/twitter-statistics/). And yes, those are real statistics, taken from Twitter’s API.

To further compound this already laughably high volume of messages to filter, expand your thoughts beyond Twitter. Here’s a small picture of just how much happens on the internet every second of every day; this doesn’t even cover some of the other major social media sites like Facebook. At the time of writing, the numbers for average communications per second are averaging roughly 7,700 tweets, 1,000 Instagram photos and Tumblr posts, nearly 3,000 Skype calls, 48 terabytes of data, and over 2.6 million spam emails.

Then factor in statistics for other popular social media sites such as Facebook, with this site being updated regularly with the latest statistics from Zephoria Digital Marketing, at the time of writing this. Facebook has over 2 billion monthly active users, with around 1.3 billion active daily, with an average of 5 new profiles created on the site every second. There are also roughly 500,000 comments posted every single minute, as well as 290,000 status updates and 136,000 photos.

If these numbers don’t tell you something about the sheer insurmountability of enforcing such ridiculous laws, then nothing will. Taken in isolation, If we were to just take the average 7,700 tweets per second figure, and then try and parse that into how many people would be required in order to police all of this manually, say at a generous average of 3 seconds taken, per tweet, per person, to determine whether it was deemed “allowed” to be posted online, that alone would require 23,100 people to be reading and moderating tweets at 3 seconds per tweet, every single hour of every day and night in perpetuity just to keep up. And the number of people required to do this would also grow over time exponentially.

Let’s stretch that to Facebook next. We can’t use the same already generous, albeit incredibly rough, average for Facebook that we did with Twitter, since Facebook does not have the same 140 character limit. We would also have to moderate photos, videos, status messages, and long winded posts that far exceed a that 3 seconds. So, let’s say that while there are a percentage of really long posts ranging from a paragraph or two to a much longer and more detailed blog or news post, there is also a large percentage of much smaller messages that may take just a few seconds to parse. So let’s take a more approachable compromise in the middle of say, 30 seconds per comment, status message, photo, or video, in order to round out the outlying huge and tiny posts in between.

Taking into account the rough figures stated above, that leaves us with somewhere in the ballpark of 826,000 pieces of content to moderate every single minute, which, when broken down into our previously agreed upon average of 30 seconds per post, would mean that you would need somewhere around 413,000 people to moderate all of that manually also, an even more hilariously impossible figure.  Again, that figure would have to grow organically with the volume of traffic over time.

“But most of this can be automated” you say? While yes, a certain degree of this can be automated, there is no current system that has sufficient complexity or competency to do this to any kind of degree of accuracy. Take for example YouTube, an incredibly popular video upload site, with some 70,000 views per second at the time of posting. if you haven’t read about the controversies surrounding the automated content management systems put in place on the site over the last year or so, then let’s quickly get you up to speed.

The site employs a set of technologies and algorithms designed to automate the process of finding copyrighted content in video uploads, with reference materials provided by copyright owners or representative companies to compare content, in order to match videos for these automated takedowns, thus ensuring that the content can be subject to monetization for the parts of those videos where the owner’s content is used. This system, even when provided with the EXACT content it needs to look for in the videos uploaded across the site, still rings up constant false positives, and fails to take into account other reasons that the content may not need to be flagged at all, such as fair use for the purposes of review or criticism.

So even if you were to provide an automated system that was as sophisticated as the one put in place on YouTube, to match so called “red flags”, it would be matching so many false positives and missing so many real threats that it might as well not exist at all. These aren’t copyrighted songs or videos inside other videos, playing out the same or similarly enough to simply match based on existing content. These are posts hand-crafted by individuals that may never be structured the same twice. You would definitely be narrowing the field of potential red flag events in the system for people to moderate, but manually moderation would still have to be undertaken, and you would still be missing plenty of real threats that slip through the cracks by wording themselves too cryptically or subversively for the system to accurately flag.

Experts the world over have historically attempted to explain the sheer undertakings involved in such technology for this kind of thing time and again, to governments that simply have no understanding of just how big an ask this kind of thing is.

One pertinent example is that of GCHQ’s former Deputy Director of Intelligence Brian Lord, that was interviewed on Radio 4’s Today programme (starting at 2:44:22), which is well worth a listen.

The interview starts with a very leading question by the interviewer Justin Webb, asking “Whose side are you on?”, Lord responds quite rightly that “I don’t think it’s really a question of taking sides on this, I have sympathy from both perspectives”. Which is completely true, while this article has mostly focused on the sheer laughable statistics required to pull off the kind of moderation that governments are asking for, it’s also not entirely unreasonable to want to restrict material that could aid terrorist activities or radicalise the susceptible or the uninitiated.

Lord summarises the wording by multiple governments about the unacceptable nature of the sheer ease of access to this kind of material, how it’s “just not on” and “in a modern day world that cannot be acceptable”, then summarising the opposing argument from the perspective of the tech giants who “have to manage a very difficult balance”, explaining that though they are receptive to the idea, that they need to do more to help prevent this kind of thing. “Yes, we can do more, we should do more, we are doing more”. However they are also balancing “…a very different set of relationships”. He then explains the very obvious point that “helping support government’s counter-terrorism, to one audience, to a different audience is helping governments spy on people and we’ve been there before and they have to walk a very difficult line between them”.

Webb at this point steps in to add, “that’s of course even before we get to the technical difficulties of doing it, it’s actually what material to ban and how closely to monitor those who might put it up”. Lord agrees with the statement expands on it saying “Artificial Intelligence is a word that is bandied around very, very freely, ‘Oh Artificial Intelligence, get technology to do it, it’s very easy isn’t it’, well it’s a lot more difficult than that because Artificial Intelligence in one way… You can use a sledgehammer to crack a nut, and so actually one can say well just take a whole swathe of information off the internet because somewhere in there will be the bad stuff we don’t want people to see, but then that counters the availability of information and I’m sure the government’s researchers, even the researchers here at the BBC, would scream if a whole swathe of data wasn’t available to them, to be able to provide the services that they do. So the more refined you want your Artificial Intelligence to be, and focused, and remove the data that you really want removing, that should be, but leave behind everything that is legitimate, starts to become far more challenging.”

The interviewer then follows up on Lord’s point, stating “This is a war that can never be won isn’t it? Because as long as people want to put this stuff up and copy it, and put it on a disc or something, or a memory stick, and then just re-copy it, it’s always going to be a constant battle and there will always be this material online.”

Open Rights Group’s Executive Director Jim Killock has also spoken about the use of caution when employing automated takedowns. “Internet companies have a role to play in removing illegal content from their platforms, but we need to recognise the limitations of relying on automated takedowns. Mistakes will inevitably be made – by removing the wrong content and by missing extremist material.”

“Given the global reach of these companies, automated takedowns will have a wide-reaching effect on the content we see, although not necessarily on the spread of extremist ideas as terrorists will switch to using other platforms.”

He also states that the move could have “wider implications” and be used to…“justify the actions of authoritarian regimes, such as China, Saudi Arabia, and Iran, who want companies to remove content that they find disagreeable.”

That being said, internet companies also need to be seen to be receptive towards these concerns and the demands placed on them to “do better”, especially with Theresa May riding on the coat tails  of the recent homemade bucket bomb that was set off on a London Underground train a week prior. In fact, Google used the opportunity to announce its launch of a fund to address hate and extremism that is about “Data-driven, human-focused philanthropy”, with an initial backing of $5 million, and the first grant of $1.3 million going to the UK’s Institute for Strategic Dialogue, and tasking others to come up with “innovative, effective and data-driven solutions that can undermine and overcome radicalisation propaganda” in order to receive funding.  In that respect, maybe there is hope for the internet and free speech after all.

Internet News – September 2017

Tech Update (September 2017)

Subscribe to the Starjammer Bulletin