Thinking Sociologically About Machine Learning

July 25, 2023

I sometimes mention in my written work and speeches that I have a sociology background, and used to be an adjunct professor of sociology at DePaul University before embarking on my data science career. I loved sociology, and still do — it shaped so much about how I understand the world and my own place in it.

However, when I made a career change and turned to data science, I spent a lot of time explaining how that background, training, and experience were assets to my practice of data science, because it wasn’t obvious to people. I was mingling with a lot of engineers, physicists, and other folks from STEM, and I definitely stood out as having a different approach at times.

I used to explain it in this way: Data science is only relevant if the results of the model, analysis, or tooling are useful to people, used by people, and don’t cause harm. Because of this, data science needs to be attentive to the perceptions, realities, and needs of people — that’s the only way we can build data science products that will be useful and beneficial. Sociology is the study of how people interact with systems, structures, institutions, and culture. So, almost by definition, if we want to study how people interact with data science in any broad sense, sociology is a great place to start.

Data science needs to be attentive to the perceptions, realities, and needs of people — that’s the only way we can build data science products that will be useful and beneficial.

These days, however, people don’t ask me what sociology has to do with data science anymore. Instead, when I talk about being a sociologist-turned-data scientist, people want to talk about the incredible effect that AI (particularly of the generative type) is having on our culture and institutions. Suddenly, the relevance of insight into how society works is inextricably intertwined with the practice of data science. Machine learning has in fact been affecting all our lives in lots of ways for years, but it’s become much more obvious in the last several months, and the pace of change has accelerated.

Many people are a bit alarmed, and at the same time not sure if alarm is warranted. A lot of that comes from the inscrutability of machine learning — most of the very talented folks working in this field and making that pace accelerate are not teachers, not social scientists, and not avid communicators, so the perpetual challenge of science communication for public consumption has become critical for data science in a way it wasn’t before. It seems like a time when people who have that different skill set inside the field ought to be getting involved to help.

Evolving

I gave a talk some years ago about What Data Science Can Learn From Sociology

which you can still watch if you’re so inclined. My themes were largely around the culture of data science and the lives of practitioners — privilege we enjoy by being “Data Scientists”, inherent bias in our field, and so on. It feels a tad dated, because at that time I wasn’t thinking as much about sociological applications directly to our work, but sociological implications of how we work and who is empowered to become data scientists. (I still stand by all the points I made, but it was a relatively narrow lens.)

There is an extension of those ideas, which I did discuss some in my talk, but which feels more and more relevant in our field’s new era. That is that a field lacking in heterogeneity of practitioners (like ours) is at huge risk of terrible blind spots in our actual work. Without calling out any specific category of this heterogeneity (we know what we’re talking about here, and we know the characteristics of the leaders of AI and machine learning in our culture), I still continually am baffled that people don’t appreciate that to get different ideas, perspectives, creativity, and innovation, you need people who are from different walks of life. You just do. There’s no good substitute for different experiences of living in the world, when it comes to developing different thought patterns and perspectives about that world.

I still continually am baffled that people don’t appreciate that to get different ideas, perspectives, creativity, and innovation, you need people who are from different walks of life.

That’s a topic that could fill a whole post on its own, however. The point I want to make here is that communicating about these themes, and forcing our field to reckon with them, is important and worth doing. I love data science and machine learning, and I’ve been incredibly lucky to be able to do it as a career for the last almost decade. I think we can do amazing things, and I’m excited to see that progress. But I’m also intent on making sure we don’t cut corners on measures of quality that aren’t as easy to quantify, including those around our impact on the world around us.

I had the privilege of enjoying Dr. Rumman Chowdhury ’s keynote at SciPy a few weeks ago, on issues around responsible AI, and a lot of that content intersects with how I think about these matters. (She’s amazing, if you aren’t familiar.) She gives a clear, convincing argument that responsible AI needs to be a priority for those of us in the field as well as government, regulators, and so on. We’ve got boatloads of intelligent people racing forward with the technological advancements — but not as many people on the inside of the field, with the requisite technological understanding, taking a closer look at the implications of those advancements.

In this light, it seems important to use my channels for not just teaching data science practice (which I still intend to do) but also to insert a little more of the sociological perspective into conversations about machine learning, AI, data science, and whatever else we decide to call it next. If there’s an issue in this sphere that you’d like to have analyzed sociologically, please comment and I’ll see what I can do.

You can find more of my work at www.stephaniekirmer.com.

I use ‘machine learning’ and ‘data science’ somewhat interchangeably here- to clarify, I consider machine learning a subset of data science.