How I Came to Care About AI, Bots, and the Ethics of Silicon Valley
My presentations to the RLUK/DLF Event and the CARL Repository Community of Practice.
Over the last month, I’ve been on a bit of a roadshow discussing the intersection of AI, bots, and the ethics of our tech sector. It started with a panel jointly hosted by the Digital Library Forum (DLF) and Research Libraries UK (RLUK) in late October, and continued recently with a session for the Canadian Association of Research Libraries (CARL), where I presented alongside my colleague Arran Griffith, the Community Manager for Fedora and the lead for an AI Discussion Group I have participated in.
The post below is the script from the DLF/RLUK session. This post follows the ‘spark’ of curiosity that took me from investigating harvested content with my team to asking the hard questions about ethics and ‘openness’ in the age of AI.
The Spark: When Outage Met Curiosity
When people ask how I got interested in AI, I usually say it wasn’t because I was looking for it—it’s because AI came looking for me. That moment became the hook for everything that followed.
At Emory, my team had been building open digital collections and repositories for years, proud of our work as a public good. But in the spring of 2024, strange outages and unusual web traffic revealed that our infrastructure had caught the attention of something new: AI bots hungry for data.
We were noticing an uptick in web traffic. And it wasn’t from students or researchers; there just aren’t enough people in the Emory Community to use our infrastructure at the rate the data showed. One thing we noticed was that the traffic was coming from obscure devices, and the volume seemed unrealistic when we dug into it. Over and over. Thousands of hits, all targeting our digital collections, our open-access repository, and our catalog.
At first, I thought it was me. Around that time, I took over managing the team responsible for on-call support and infrastructure support (servers, databases, etc.) after a manager left my team. I thought, maybe this is normal, and I’ve been oblivious to what was happening.
But then, in May, at a conference, a colleague at another institution had to step out of a session because their catalog was down. The remaining AULs for technology reported struggling with similar outages. A couple of months later, a vendor mentioned they had to increase their security spending because they were constantly under “attack” as well.
When we realized what was happening, we started investigating the IP addresses and user agents engaging with this malicious behavior and discovered that the IPs were tied to AI companies. My team eventually set up CAPTCHA, firewalls, and monitoring alarms, which allowed us to be warned and better block these bad actors. We realized we weren’t dealing with an attack, and I wasn’t being oblivious. It was AI bots scraping our content for their companies’ training models.
That was the moment my curiosity led me down a path of exploration. I realized that libraries had inadvertently crossed into a new territory where their openness was leading to the exploitation of content.
The Wake-Up Call
Around the same time, I attended a Fedora Governance Committee meeting where others were noticing similar issues. The Fedora Community Manager, Arran Griffith, created a space for us to discuss how the software might help. It quickly became clear that the problem extended far beyond Fedora.
Arran began inviting people from other communities to collaborate, and the group has since grown. We now host regular Solutions Showcases for technology professionals focused on bot blocking and related topics.
A few of us also presented a panel at the Fall CNI Member meeting to share what we’d learned. Many attendees approached us afterward, relieved to find they weren’t alone in facing these attacks.
Learning that bots had surpassed human activity online was the turning point. I began to wonder: if openness now fuels corporate harvesting and AI training, what does ethical access look like? Are we building systems for humans, or merely for the bots? And, most importantly, who gets to decide — the institutions that steward knowledge, or the companies that can afford to harvest it?
If you’re like me and prefer watching content, you can view the recording of the session I gave with my colleague Arran Griffith for the Canadian Association of Research Libraries (CARL).
In this presentation, we dove deeper into the technical specifics and the community response to bot attacks.
The Shift: When AI Changed The Focus
As we started monitoring this traffic more closely, I noticed an interesting pattern. The AI systems were moving away from our special collections repository — the one full of images — and focusing almost entirely on our catalog and our open-access repository. And then I had my next ah-ha moment.
Image generation models had matured. They didn’t need our pictures anymore. Now, they wanted our metadata and the university’s research. AI companies were now focusing on the intellectual frameworks and knowledge ecosystems that libraries and universities had built. The descriptions, relationships, and categorizations we carefully created in libraries over thousands of years were being harvested, tested, and monetized by giant technology companies worth billions.
Next, I noticed that conversations about AI accuracy were dominating the hype cycle. They were talking about creating links to content to prove that what it outputted made sense. By the time I made the connection, the Wiley/Perplexity deal had been announced. If you aren’t familiar, this deal enables institutions that have an institutional subscription to Perplexity and Wiley to access Wiley journals via Perplexity, meaning that the chatbot could easily generate text based on the content in Wiley’s journals. For authors and libraries, this matters because it blurs the boundary between fair use and commercial reuse, raising questions about consent, citation, and ownership of scholarly work.
But it doesn’t stop there; AI companies already have access to our content and can display it to their users. In the video below, I’m showing you OpenAI’s chat interface so that you can easily see how it pulls up information and images from my institution’s digital and digitized collections.
How did it get this access? Well, a lot of our content is available in the Common Crawl dataset, which many AI companies use for training their models (you can check if your website is in the dataset via the Common Crawl Index Server) If you ask ChatGPT, it will side-step the question and say that it doesn’t know if it used Common Crawl, but if you read any of the legal filings between OpenAI and the NYTimes, you’ll see that OpenAI did use the dataset. So, of course, it has our content, and of course, it can output this information about Emory’s collections.
The Mirror: When Technology Reflects Power
My background in political philosophy has taught me to see how power hides inside systems. When I look at AI today, I recognize familiar patterns: centralized control, opaque decisions, and ideological visions dressed up as ‘progress.’
One afternoon, while sitting in my car scrolling Instagram (as one does), I stumbled across a Reel about something called the TESCREAL Bundle. I assumed it was satire—until curiosity won out. I soon found Dr. Timnit Gebru and Emile Torres’s paper, “The TESCREAL bundle: Eugenics and the promise of utopia through artificial general intelligence”. Reading it, I realized these weren’t just eccentric fringe ideas; they revealed a worldview shaping many of AI’s most powerful players.
Tracing the networks among Silicon Valley figures — Peter Thiel, Elon Musk, and others — I saw how shared ideology underpins the same companies shaping our digital landscape. Over the past year, those connections have become more visible, as influential tech leaders speak more openly about longtermism, transhumanism, and market absolutism. What once felt like quirky personality politics now reads as a coherent ideology—one that libraries, as stewards of human knowledge, must learn to recognize and challenge.
Over the past year, we’ve seen more individuals in Silicon Valley openly display oligarchic tendencies. Figures like Elon Musk, Jeff Bezos, Mark Zuckerberg, and others are doing less to hide their beliefs behind their corporations and are more willing to speak out. Today, as I observe the power structures in Silicon Valley, I notice the companies are becoming increasingly problematic; it feels like we can’t escape these harmful belief systems.
The Invitation: What We Need to Talk About Today
I’m not here to deliver conclusions but to open a conversation. To encourage us to think about what responsibilities and choices we face as technology reshapes our field. Around the world, people are debating ideas like digital sovereignty, regenerative economic theory, and ethical divestment from exploitative systems and companies. My goal here is to explore these issues together rather than prescribe specific actions.
So here are a few I’d like to put on the table:
If AI depends on our openness, how do we ensure that openness doesn’t lead to exploitation and that our users stay prioritized?
How can we build trust in our own systems and with our users when AI serves as both our partner and our adversary?
Should we all, including the US, pay more attention to the concept of digital sovereignty — the idea that we should invest in infrastructure not controlled by big technology conglomerates in the US — than we have so far?
I’d love to hear your thoughts on these topics. Please feel free to reach out via direct message or put your thoughts in the comments below.



