Into the Crossfire

Evaluating the Use of a Language Model to Crowdsource Gun Violence Reports

Adriano Belisario, Scott A. Hale, Luc Rocher

The data gap on gun violence

Gun violence is a severe human rights issue that affects nearly every dimension of the social fabric, such as healthcare, education, psychology, and the economy.

The United States and Brazil account for a large share of global firearm-related violence, reaching epidemic and public health crisis levels.

Reliable data is crucial to develop effective policies and emergency responses. In Brazil, however, there are no official records of gun violence events.

Fogo Cruzado (“Crossfire”) monitors events of gun violence in four Brazilian cities.

Analysts track social media posts and on-the-ground sources 24/7.

They have been interacting with users who report gun violence on Twitter/X since 2018.

Keyword-based search with geographical filters on Tweetdeck.

Mobile app shows real-time alerts of gun violence events.

The needle in the haystack

Social media is a valuable source for crowdsourcing evidence in human rights monitoring and investigations, but…

  • Keyword-based search leads to a high proportion of unrelated text

  • Small teams can’t process high volumes of data

What’s been tried

Previous works show that machine learning models can help human rights organizations to filter large volumes of data.

However, we found several gaps in previous research:

  • No systematic evaluations of adopting these model in real-world settings

  • No previous work with Portuguese texts

Our Work

We built an open-source language model to help crowdsource gun violence reports from social media.

With Fogo Cruzado, we tested its real-world use in Brazil (2023).

We asked whether Transformer-based models can detect gun violence reports in Portuguese and how they support analysts in daily monitoring.

To answer these questions, we fine-tuned a BERT model on past interactions and built a web prototype to visualize results, evaluating its impact through surveys, interviews, and interaction metrics (diff-in-diff).

Text classification

Positive Examples Negative Examples
Gunshots started going off right when I ordered a milkshake. I hope this man is a brave warrior. Sometimes, certain words are like a shot, especially when you’re feeling a little insecure.
People here randomly fire off shots out of nowhere. Oh so many distracted friends’ photos that I took Jesus I deserved to get shot lol.
I already wake up startled, hearing gunshots. I’m trying to let my nails grow, but when anxiety attacks, I tear them all off.

Text classification

A BERT-based model in Portuguese achieved good performance (87% of recall for positive cases).

The prototype

Tweets were updated every fifteen minutes (later reduced to five).

Signal-to-noise ratio

The prototype effectively filtered out less relevant social media content.

Interview with an analyst

[Now] I do not have to go hunting for tweets.

Sometimes, I missed them [gun violence reports] because there were too many [unrelated] messages. During the BBB [Big Brother Brasil, an annual TV show extremely popular on Twitter], it was chaotic [. . . ]. It was literally a treasure hunt

Fewer filters, greater scope

Our prototype removed the need for restrictive geolocation filters, allowing analysts to expand their search scope.

We estimate that analysts using the model engaged in about nine additional daily interactions (a 40% increase) with users reporting gun violence events.

Insights for future works

Interviews and surveys revealed valuable lessons for building more effective human-AI monitoring tools:

  • Timeliness matters: frequent, near-real-time updates are essential to support analysts responding to fast-evolving events.

  • Flexibility in keyword management: allowing dynamic search terms can better capture emerging language during live conflicts.

  • Beyond text: integrating visual and contextual cues, such as profile images, can enhance analysts’ decision-making.

Takeaways for CSCW

🤝 AI can amplify crowdsourcing, not replace human judgment.

🧭 Participatory design: working closely with human rights organizations helps ensure tools meet real, on-the-ground needs.

📊 Real-world evaluation matters: lab performance ≠ practical impact

⚠️ Platform dependencies are fragile: closure of API access is critical.

Thank you!

📧 Email: adriano@belisario.website

🌐 Website: belisario.website

👨🏻‍🏫 Presentation: belisario.website/crossfire_paper/