Facebook deploys ‘Red Team’ to hack its own AI


Social media, a platform used to express talents by some and business offers/promotions by others. It is one of the fastest growing digital networks that have successfully brought people around the world under a single roof. One can’t conduct their daily routines without posting an update on their profile. A social media account seems to be a necessity not just because of the friendships we make, but for the connections that can propose undeniable deals such as a job or internship offers. But it is equally responsible for users to post contents that are user-friendly and does not break the platform’s guidelines.

Recently, filters and effects were added onto Instagram and Facebook’s algorithms to help users capture real-time photos and makes the content more sharable. However, it was found on February 2019 that Instagram users had their content directed to a whole different audience challenging the Facebook automated porn filters. These users edited their photo in such a way that the sensitive contents were hidden with filters or effects. Yet certain exposure made the content vulnerable and not user-friendly. Facebook depends on the moderations powered by AI to detect explicit content and block the image or the user. The AI was proved be an efficient worker for it was able to do its job perfectly and made sure the environment and its users followed the user guidelines appropriately.

Facebook has users varying ages and the least goes to 12-13. The action against this matter had to be taken immediately and tested various other softwares and programs but the results were short-lived and users always found a new way to dodge the eye of the system. This prompted Facebook to deploy its ‘AI Red Team’ to understand the loopholes buried within its AI and fix it. Several companies and government organizations has taken an active participation by creating similar “teams”.

A team led by Facebook’s own employ Cristian Canton, computer-vision expert who joined the company in 2017 and ran a group that works on image moderation filters, introduced a contest called ‘risk-a-thon’ where participants were to identify what factors tricked the AI to recognize unethical practices. One team at the contest showed that using different languages within a post could befuddle Facebook’s automated hate-speech filters. A second discovered the attack used in early 2019 to spread porn on Instagram, but it wasn’t considered an immediate priority to fix at the time.

Currently, the biggest problem Red Team faces to identify Deepfakes, imagery generated by AI but seemingly appear to be captured on camera. This is one of the proofs that preventing AI trickery isn’t easy.

With several malpractices and immoral behavior exhibited on a large platform as this, especially by deceiving the AI system, to spread hate or illicit content shows no matter how perfect AI and programs be, there is always a loophole to trick the system.


Please enter your comment!
Please enter your name here