Evaluating the Use of a Language Model to Crowdsource Gun Violence Reports
Gun violence is a severe human rights issue that affects nearly every dimension of the social fabric, such as healthcare, education, psychology, and the economy.
The United States and Brazil account for a large share of global firearm-related violence, reaching epidemic and public health crisis levels.
Reliable data is crucial to develop effective policies and emergency responses. In Brazil, however, there are no official records of gun violence events.
Zejin Ou et al. Global burden and trends of firearm violence in 204 countries/territories from 1990 to 2019. Frontiers in Public Health, 2022. https://www.frontiersin.org/articles/10.3389/fpubh.2022.966507
Phil B. Fontanarosa and Kirsten Bibbins-Domingo. The Unrelenting Epidemic of Firearm Violence. 2022. https://doi.org/10.1001/jama.2022.17293
Fogo Cruzado (“Crossfire”) monitors events of gun violence in four Brazilian cities.
Analysts track social media posts and on-the-ground sources 24/7.
They have been interacting with users who report gun violence on Twitter/X since 2018.
Keyword-based search with geographical filters on Tweetdeck.
Mobile app shows real-time alerts of gun violence events.
Social media is a valuable source for crowdsourcing evidence in human rights monitoring and investigations, but…
Keyword-based search leads to a high proportion of unrelated text
Small teams can’t process high volumes of data
Previous works show that machine learning models can help human rights organizations to filter large volumes of data.
However, we found several gaps in previous research:
No systematic evaluations of adopting these model in real-world settings
No previous work with Portuguese texts
Ayman Alhelbawy et al. An NLP-Powered Human Rights Monitoring Platform. Expert Systems with Applications, 2020. https://doi.org/10.1016/j.eswa.2020.113365.
We built an open-source language model to help crowdsource gun violence reports from social media.
With Fogo Cruzado, we tested its real-world use in Brazil (2023).
We asked whether Transformer-based models can detect gun violence reports in Portuguese and how they support analysts in daily monitoring.
To answer these questions, we fine-tuned a BERT model on past interactions and built a web prototype to visualize results, evaluating its impact through surveys, interviews, and interaction metrics (diff-in-diff).
| Positive Examples | Negative Examples |
|---|---|
| Gunshots started going off right when I ordered a milkshake. I hope this man is a brave warrior. | Sometimes, certain words are like a shot, especially when you’re feeling a little insecure. |
| People here randomly fire off shots out of nowhere. | Oh so many distracted friends’ photos that I took Jesus I deserved to get shot lol. |
| I already wake up startled, hearing gunshots. | I’m trying to let my nails grow, but when anxiety attacks, I tear them all off. |
A BERT-based model in Portuguese achieved good performance (87% of recall for positive cases).
Tweets were updated every fifteen minutes (later reduced to five).
The prototype effectively filtered out less relevant social media content.
Interview with an analyst
“[Now] I do not have to go hunting for tweets.
Sometimes, I missed them [gun violence reports] because there were too many [unrelated] messages. During the BBB [Big Brother Brasil, an annual TV show extremely popular on Twitter], it was chaotic [. . . ]. It was literally a treasure hunt”
Our prototype removed the need for restrictive geolocation filters, allowing analysts to expand their search scope.
We estimate that analysts using the model engaged in about nine additional daily interactions (a 40% increase) with users reporting gun violence events.
Interviews and surveys revealed valuable lessons for building more effective human-AI monitoring tools:
Timeliness matters: frequent, near-real-time updates are essential to support analysts responding to fast-evolving events.
Flexibility in keyword management: allowing dynamic search terms can better capture emerging language during live conflicts.
Beyond text: integrating visual and contextual cues, such as profile images, can enhance analysts’ decision-making.
🤝 AI can amplify crowdsourcing, not replace human judgment.
🧭 Participatory design: working closely with human rights organizations helps ensure tools meet real, on-the-ground needs.
📊 Real-world evaluation matters: lab performance ≠ practical impact
⚠️ Platform dependencies are fragile: closure of API access is critical.
📧 Email: adriano@belisario.website
🌐 Website: belisario.website
👨🏻🏫 Presentation: belisario.website/crossfire_paper/