Fighting Fake News with AI

We all have seen them: “news”, ads, chain messages. They are scattered with content supposedly selected for us: unreliable news, rumors, malicious texts try to divide communities, cities, nations and the world. Fake news is content created maliciously with the intention of misinforming people, generating more traffic to sites with scandalous headlines, or with extremely politically biased content. In general, these type of news try to deceive the consumer imitating the appearance of reputable sources such as newspapers, blogs, and even videos, generating damage to democracy and societies.

The most important example in recent years has been the US elections in 2016 where more than 146 million people were reached by content created by 470 Russian-backed accounts[1], these attacks plus the Cambridge Analytica scandal made the founder of Facebook declare to the Congress and explain these vulnerabilities[2]. Without going any further, in the recent election in Brazil, disinformation spread through different social networks, generating confusion among the followers of different candidates, further increasing the differences between opposition groups, and generating unnecessary violence in a democratic process[3]. Even worse, at least twelve people have died in India after being falsely accused, through WhatsApp messages, of being children kidnappers[4].

It seems that for the average reader it’s not a habit to review the origin or the quality of the information consumed. Even more difficult is to trace the origin of some content through social networks where with a simple click you can share to thousands of people any type of content. A similar problem is experienced in chat platforms, where it’s almost impossible to know where the information comes from, many times copied and pasted between groups of friends.

A study published in the journal Science in 2018[5], shows how false news diffused faster than truthful news. An analysis conducted on hundreds of thousands of Twitter posts shows how as many as 126,000 rumors were spread over 3 million people. The study points that the speed and scope of this false news diffuse up to 100 times faster compared to real news. The researchers’ conclusion is that false news reached more people around the world than truthful news.

Social networks have created a lot of value, both for consumers and advertisers, generating platforms that connect friends, families, groups with common interests and diverse communities, but at the same time, these platforms have created massive distribution channels for malicious groups. It is true that Facebook, Twitter, and Google are working on improving their platforms[6], but this effort is not enough to solve the problem. The question now is: can these companies by themselves regulate their content and platforms? Can technologies like AI help us solve this?

 

Factmata, a London-based AI startup hopes to solve this. Backed up by big names like Mark Cuban and Biz Stone[7] they have been able to raise 1.6 million in funding capital[8]. Since 2017, they’ve been working on cutting-edge technologies to detect false news. Using natural language processing and machine learning they are able to identify certain characteristics in texts such as credibility and quality. The semantic analysis performed on the text detects up to four dimensions[9]:

  1. Hate speech and abusive content
  2. Propaganda and extremely politically biased content
  3. Spoof websites and content spread by known fake news networks
  4. Extreme clickbait content

These dimensions are finally taken into consideration to create a content quality score, which allows users to see clearly which sources and contents are more reliable. This does not mean defining what information is correct or incorrect, it means giving the users the necessary tools to be able to define it on their own, increasing the awareness and critical thought of each user.

Factmata aims to help different industries such as media, advertising, finance, trading, public relations and more. Two different platforms will be launched to the market this year, with different objectives: the business platform[10] designed for advertisers and advertising platforms allows analyzing the content of multiple URLs to measure the risk of publishing ads on these pages. The main difference with current solutions is that it doesn’t use a simple black/white-list, but analyzes the context of the entire content, sentence by sentence. A platform focused on the news industry will soon be launched[11]. Reporters and researchers will be able to use the platform to verify the quality of different sources of information through credibility-scoring artificial intelligence.

 

Technological platforms generated the problem, and technological platforms will correct it. Even so, algorithms are not enough to eradicate the underlying problem, and technology companies, media conglomerates, social networks, and even the state must make a greater effort to educate its users and inhabitants. Education should go beyond teaching and promoting reading and consumption of informative content but should focus on critical thinking and analysis of sources. These platforms must also improve their content algorithms, avoiding generating echo-chambers where users only see one side of the story, without being able to analyze other points of view, further increasing the confirmation of their views without easily visible counter-arguments. Finally, the technologies used to analyze and classify news should be audited by different regulatory and perhaps public entities, to ensure unbiased analysis when possible.

There are many questions that are extremely complex to answer, and which will remain under discussion in the coming decades: can or should algorithms help us select what content to consume? Are these algorithms transparent enough to not fall into the same biases of a human being? Who decides what content is reliable and how this clashes with our freedom of speech?

 


References

[1] Wells, D. (2018). Tech Giants Disclose Russian Activity on Eve of Congressional Appearance. [online] WSJ. Available at: https://www.wsj.com/articles/facebook-estimates-126-million-people-saw-russian-backed-content-1509401546?mod=article_inline [Accessed 14 Nov. 2018].

[2] Alpert, G. (2018). In Facebook’s Effort to Fight Fake News, Human Fact-Checkers Struggle to Keep Up. [online] WSJ. Available at: https://www.wsj.com/articles/in-facebooks-effort-to-fight-fake-news-human-fact-checkers-play-a-supporting-role-1539856800?mod=searchresults&page=1&pos=16 [Accessed 13 Nov. 2018].

[3] Seetharaman, P. (2018). In Brazil Vote, Misinformation Spreads on Social Media Despite Efforts to Stop It. [online] WSJ. Available at: https://www.wsj.com/articles/in-brazil-vote-fake-news-spreads-on-social-media-despite-efforts-to-stop-it-1540566295?mod=searchresults&page=1&pos=9 [Accessed 13 Nov. 2018].

[4] The Economist. (2018). WhatsApp: Mark Zuckerberg’s other headache. [online] Available at: https://www.economist.com/business/2018/01/27/whatsapp-mark-zuckerbergs-other-headache [Accessed 13 Nov. 2018].

[5] Vosoughi, S., Roy, D. and Aral, S. (2018). The spread of true and false news online. Science, [online] 359(6380), pp.1146-1151. Available at: http://science.sciencemag.org/content/359/6380/1146 [Accessed 14 Nov. 2018].

[6] The Economist. (2018). WhatsApp suggests a cure for virality. [online] Available at: https://www.economist.com/leaders/2018/07/26/whatsapp-suggests-a-cure-for-virality [Accessed 13 Nov. 2018].

[7] TechCrunch. (2018). Factmata closes $1M seed round as it seeks to build an ‘anti fake news’ media platform. [online] Available at: https://techcrunch.com/2018/02/01/factmata-closes-1m-seed-round-as-it-seeks-to-build-an-anti-fake-news-media-platform/ [Accessed 13 Nov. 2018].

[8] CrunchBase. (2018). CrunchBase: Factmata. [online] Available at: https://www.crunchbase.com/organization/factmata#section-overview [Accessed 13 Nov. 2018].

[9] Factmata.com. (2018). Factmata. [online] Available at: https://factmata.com/technology.html [Accessed 14 Nov. 2018].

[10] Factmata.com. (2018). Factmata. [online] Available at: https://factmata.com/business.html [Accessed 14 Nov. 2018].

[11] Factmata.com. (2018). Factmata. [online] Available at: https://factmata.com/news-platform.html [Accessed 14 Nov. 2018].

(929 words, sorry)

Previous:

Do or Die? Walmart’s foray into Machine-Learning and the implications for Its competitiveness amidst the Amazonian squeeze

Next:

3D Printed Pasta – Are There Limits to the Benefits of 3D Printing in the Food Sector?

Student comments on Fighting Fake News with AI

  1. Great article- a very interesting use case for AI. I think you pose a number of worthwhile questions to consider, especially regarding biases introduced by those creating the boundaries/parameters within which the AI operates to detect questionable content. I don’t think there are any easy answers to those questions, and the only way I can imagine addressing the point of bias in these AIs is to crowdsource the definition of questionable content to as many people as possible, over as many learning iterations as possible.

  2. This is quite a relevant use case for AI. I am optimistic there is a lot of technology that can spot manipulated images or false stories, however, I agree with your concerns above. Practically to help the system learn, Factmata will need to create a very robust fact checking and validation algorithm. With the sheer number of and biased views in everyday media (even through reputable sources), I think a challenge will be understanding the subtle differences between fraud and biases, which even a human has difficulty deciphering.

  3. Thank you for your insights on the topic of AI. I do agree that a large part of what we have to figure out about AI is who controls it. The mechanism of cleaning for fake news can easily be imagined in a context in which a leader, e.g. dictator, eradicates comments that might actually be true but do not serve a certain narrative the leader is supporting. I’d be interested to hear solutions to this, but I assume in the end we will again have to trust a governing body whether from the private or government sector.

  4. Thanks for the great content. I think it’s incredibly important for technology to help society fight issues of fake news and the influence it has today. On the flip side, I’m worried about who controls the “training” of the machines in identifying fake news. Facebook, for example, has come under fire for what challengers have called a liberal bias (https://www.nytimes.com/2018/08/28/technology/inside-facebook-employees-political-bias.html), where they claim employees police opinions that are not aligned with their own. If they are also in control of the machine learning algorithm themselves, it could give nay-sayers further reason to doubt the legitimacy of news organizations.

  5. Thank you for such an interesting post on a very real problem in the world of contents! Most of us are already rattled by the amount of contents that are fed to us, and trying to evaluate the ‘truthfulness’ of each piece of information is just impossible. I personally feel this project has a lot of potential. The lower hanging fruits for me are contents related to Hate speech and abusive content, Spoof websits and extreme clickbait contents. However, when it comes to analysing opinion piece such as political content, more caution should be exercised. It would be challenging to develop algorithms that is completed free of biasis as we view the world through our own lenses, that are affected by cultural background, national and personal interests, education level and many others factors. This might limit the audience who find this app useful.

  6. Very informative. As some of the other commentors have pointed out, some governance structures may have to be developed to ensure the impartiality of these technologies as it is increasingly applied to more nuanced pieces of work.

  7. This is a fascinating topic and a compelling argument for the use of AI to mitigate the infectiousness of “fake news”. I believe you hit the risks spot-on in detailing how inherent in this entire process is the assumption that the AI truly knows what is fake, and what is real. Unfortunately in our current world, the line between the two seems increasingly smudged, and you are correct to point out that it is difficult to teach computers to do something that we humans cannot define clearly ourselves. I appreciate your contextualization of this company in the broader AI context and hope that this type of initiative will give us hope for the future!!

  8. Impartiality of technology is indeed an important topic, but it does seem to me that it is not the main issues here. By virtue of their name, fake news are inherently incorrect and misleading. As such, filtering them does not necessarily make the selection of articles impartial. In fact, the articles that do get through the sift will be a diverse collection of opposing opinions based on true facts.

  9. I think that this is a very important topic, especially in light of the growing tremendous influence social media is having on our society nowadays. I definitely think AI and machine learning can have a lot of potential in identifying “bad” content, making the process more efficient, and hence saving costs for technology companies. However, I have some doubt regarding the “efficacy” of AI when it comes to language, especially in long-form content. Language is arguably one of the hardest tasks for a machine to master, because it needs to not only understand the vocabulary, but also to able to put into context, get the nuances, the puns, the emotions conveyed through texts, stuff that are hard to be built into an algorithm. Hate speech nowadays can be extremely subtle. Also as social media today is extremely global, the technology or “AI” built to tackle this issue also needs to be global, i.e can master the subtleties of many different languages, which to me seems to be a very daunting task.

  10. This is an extremely thought provoking article! Fake news is a big threat to society today as it very quickly and effectively sows discord among people over the internet. The work that Factmata is doing will be critical to rebuild the public’s trust with the content that they are consuming. The danger in the work that Factmata is doing however, is that their algorithm needs to be careful that it is free of any kind of bias. I agree with your suggestion that the algorithm itself needs to be validated by either a public body or established organization to ensure that it doesn’t limit free speech. The line between fake news and opinion can sometimes be blurred and how we determine that line is a critical question society needs to answer.

  11. Very thought provoking article. My main concern is what data will be used to train the AI to learn on? For example, if one were to read the websites for Fox News, Al Jazeera, and MSNBC it would be possible to come away with very different conclusions about what reality is. Who is getting to decide what is a fact or what is reality? The AI machine, might well suggest 2/3 are fake! None of these sites would be considered fake by the constituents who read them, but may well be filtered out by a machine AI that is trained in a specific manner. It may once have been the case that the news reported facts, but these days so many things are driven by the news cycle and a soundbite. I believe that more sources of news are better as they allow readers to make up their own minds, but filtering out “obvious” fake news seems quite troublesome.

  12. Tailored content recommendations are already a big part of the digital media and information system. We are constantly being bombarded by recommendations of news articles, TV shows, and movies. When algorithms recommend us certain content, they are by definition inhibiting our ability to consume other content through omission. Society more or less feels ok about this today. Given this, why shouldn’t algorithms also censor fake news that spreads misinformation and lies? It seems to be a similar logical construct as personalization in tech.

  13. Thanks for sharing your thoughts. This is a very important and fascinating topic (as well as the subject of my post).

    The solution proposed by startup Factmata is interesting, but seems unlikely to solve the issue. The four dimension you listed were using semantic information for Hate speech and abusive content, Propaganda and extremely politically biased content, Spoof websites and content spread by known fake news networks, Extreme clickbait content – however, it can do little to identify incorrect information. So if an article is written by a human in the tone of a typical news article, but with incorrect facts or reasonable political bias, it seems unlikely to succeed.

    Facebook has a couple of awesome AI initiatives to fight Fake News, but it’s still a real issue. Maybe the most unfortunate side effect is that the existence of fake news denigrates perceptions of real news.

  14. A very interesting read and thank you so much for your time!
    I personally believe that the algorithm should help us screen this news, even more – it should play a more integral part of human life. However, the current technology is not developed enough to push for this. It might take another 10-20 to have a big leap but it will help daily life of human race becoming better.

  15. Very interesting read on applying AI solution for a very challenging systematic problem we are facing today. I appreciate the thought provoking process exhibited in this blog. In particular, the author questioned whether the fact check process should solely based on machine learning algorithm. I am personally interested in how humanity play in the fact checking role and whether it needs to be prioritized first. Especially, when Factmata performed on the text detects up to four dimensions:
    Hate speech and abusive content
    Propaganda and extremely politically biased content
    Spoof websites and content spread by known fake news networks
    Extreme clickbait content

    How does text classification process work when you try to detect the four dimensions; it’s predefined by algorithm or this is also a self-training process. I would imagine the process itself matters more than the result?

Leave a comment