“Re-LAION-5B fixes the issues as reported by Stanford Internet Observatory in December 2023 for the original LAION-5B and is available for download in two versions, Re-LAION-5B research and Re-LAION-5B research-safe. The work was completed in partnership with the Internet Watch Foundation (IWF), the Canadian Center for Child Protection (C3P), and Stanford Internet Observatory. For the work, we utilized lists of link and image hashes provided by our partners, as of July 2024. In all, 2236 links were removed after matching with the lists of link and image hashes provided by our partners. These links also subsume 1008 links found by the Stanford Internet Observatory report in Dec 2023. Note: A substantial fraction of these links known to IWF and C3P are most likely dead (as organizations make continual efforts to take the known material down from public web), therefore this number is an upper bound for links leading to potential CSAM. Total number of text-link to images pairs in Re-LAION-5B: 5.5 B (5,526,641,167)”
Source : Releasing Re-LAION 5B: transparent iteration on LAION-5B with additional safety fixes | LAION