Update: Apple mentions a second check on the server, and a specialist computer vision company has outlined one possibility of what this might be – described below under ‘How the second check might work.’

An early version of the Apple CSAM system has effectively been tricked into flagging an innocent image, after a developer reverse-engineered part of it. Apple, however, says that it has additional protections to guard against this happening in real-life use.

The latest development occurred after the NeuralHash algorithm was posted to the open-source developer site GitHub, enabling anyone to experiment with it…

Background

All CSAM systems work by importing a database of known child sexual abuse material from organizations like the National Center for Missing and Exploited Children (NCMEC). This database is provided in the form of hashes, or digital fingerprints, derived from the images.

While most tech giants scan uploaded photos in the cloud, Apple uses a NeuralHash algorithm on a customer’s iPhone to generate hashes of the photos stored and then compare this against a downloaded copy of the CSAM hashes.

A developer yesterday claimed to have reverse-engineered Apple’s algorithm, posting the code to GitHub – a claim that Apple effectively confirmed.

Apple CSAM system tricked

Within hours of the GitHib posting, researchers succeeded in using the algorithm to create a deliberate false positive – two completely different images that generated the same hash value. This is known as a collision.

Collisions are always a risk with such systems as the hash is of course a greatly simplified representation of the image, but surprise was expressed that someone was able to generate one so quickly.

The collision deliberately created here is simply a proof of concept. Developers have no access to the CSAM hash database, which would be required to create a false positive in the live system, but it does prove that collision attacks are relatively easy in principle.

Apple says it has two protections against this

Apple effectively confirmed that the algorithm was the basis for its own system, but told Motherboard that it is not the final version. The company also said it was never intended to be secret.

The company went on to say there are two further steps: a secondary (secret) matching system run on its own servers, and a manual review.

Apple told Motherboard in an email that that version analyzed by users on GitHub is a generic version, and not the one final version that will be used for iCloud Photos CSAM detection. Apple said that it also made the algorithm public.

“The NeuralHash algorithm [… is] included as part of the code of the signed operating system [and] security researchers can verify that it behaves as described,” one of Apple’s pieces of documentation reads. 

How the second check might work

Roboflow’s Brad Dwyer has found a way to easily differentiate the two images posted as a proof of concept for a collision attack.

Apple also said that after a user passes the 30-match threshold, a second non-public algorithm that runs on Apple’s servers will check the results.

“This independent hash is chosen to reject the unlikely possibility that the match threshold was exceeded due to non-CSAM images that were adversarially perturbed to cause false NeuralHash matches against the on-device encrypted CSAM database.”

Human review

Finally, as previously discussed, there is a human review of the images to confirm that they are CSAM.

I was curious about how these images look to a similar, but different neural feature extractor, OpenAI’s CLIP. CLIP works in a similar way to NeuralHash; it takes an image and uses a neural network to produce a set of feature vectors that map to the image’s contents.

But OpenAI’s network is different in that it is a general purpose model that can map between images and text. This means we can use it to extract human-understandable information about images.

I ran the two colliding images above through CLIP to see if it was also fooled. The short answer is: it was not. This means that Apple should be able to apply a second feature-extractor network like CLIP to detected CSAM images to determine whether they are real or fake. It would be much harder to generate an image that simultaneously fools both networks.

The only real risk, says one security researcher, is that anyone who wanted to mess with Apple could flood the human reviewers with false-positives.

You can read more about the Apple CSAM system, and the concerns being raised, in our guide.

“Apple actually designed this system so the hash function doesn’t need to remain secret, as the only thing you can do with ‘non-CSAM that hashes as CSAM’ is annoy Apple’s response team with some garbage images until they implement a filter to eliminate those garbage false positives in their analysis pipeline,” Nicholas Weaver, senior researcher at the International Computer Science Institute at UC Berkeley, told Motherboard in an online chat.

Photo: Alex Chumak/Unsplash