Episode 075 with 2023 OE Award Infrastructure Winner, Openverse === Intro Music and Opening Quotes --- [00:00:00] Zack Krida: And I'd love to figure out ways to just foster a sense of play and discovery with these things. Like, some of, the sources that you might not even think of have some pretty extraordinary results. People come to Openverse with different needs. We have people looking for sort of traditional stock photography for a blog post header. We might have someone looking for a very specific historical image for a homework assignment. We can't make these assumptions by default of what relevancy is to all users. [00:00:38] Madison Swain-Bowden: Given the size of our team, given what we're hoping to accomplish, we really came out of those discussions with the perspective of Openverse is going to be something that's for humans. We want humans to be the ones to use this project and to be the ones who are searching for the content that is made and created by humans as well, and is licensed, by those creators. That ended up being our focus. OE Global Voices Podcast Introduction --- [00:01:02] Alan Levine: It's time to record a new episode of OE Global Voices. This is the podcast produced by Open Education Global. And in this show, we share with you conversation style, people, practices, and ideas from open educators and open source developers from around the world , told to you in their own voices. I'm your host, Alan Levine, the lucky one because I just get to hear all these conversations. Open Education Awards and Open Infrastructure Award --- [00:01:24] Alan Levine: It was a while ago-- well, actually almost a year, I'm almost embarrassed to say, that we announced the winners of our 2023 Open Education Awards for Excellence, a recognition program where we highlight achievements, that the community nominates in 16 different categories. And that's a lot of recognition. I'm still continuing to catch up with our awardees. And if I can try to make the case this length of time makes it interesting because we can learn about what's happened to some of these projects and people since then. Well, that's what I tell myself. And so it's timely because we're in the process now. We're going to get ready to announce a short list of finalists for the 2024 awards. We're going to focus today on one that's been relatively new. It's called the Open Infrastructure Award, aimed to highlight, systems and tools that enable the work of educators. I'm really delighted to have representatives from a project I was very excited to see be nominated and then win, the open license media search tool, Openverse, which won the Open Infrastructure Award last year. Meet the Openverse Team: Madison and Zack --- [00:02:24] Alan Levine: Welcoming into the studio, two people from the Openverse team, Madison Swain-Bowden and Zack Krida. They work for this "little" company called Automattic, which has this minor piece of software known as WordPress. I'm saying that sarcastically because WordPress is like the dominating force and we'll maybe learn about that relationship between WordPress and,the Openverse project. Really looking forward to talking to Madison and Zack about Openverse, the mission, we might drift into some of the technical parts, but we just want to talk about, what it does, how it doesand really where it might be going. And again, I've been talking way too much, so let's, welcome our guests. I'll ask in turn, just let us know, like geographically where you are, but also describe your physical settings. [00:03:09] Alan Levine: I'll ask Madison first. Welcome. Hi, Madison. [00:03:11] Madison Swain-Bowden: Hi everyone. Thanks for having us on, Alan. It's great to be here and be talking about Openverse. I am presently in Seattle, Washington in the United States. If I were to describe my physical surrounding, it probably wouldn't surprise people. I'm looking out my window at a bunch of pine trees. It's a beautiful summer day, which means that it's 70, or right now it's actually 65 degrees Fahrenheit. And also my cat has joined us for the call. So I'm hoping that she doesn't cause too much of a disturbance here. [00:03:37] Alan Levine: And before we started, we had a cat to cat conversation between my cat Mable. Zack, you may have a cat. Hi Zack. [00:03:43] Zack Krida: Hey, yeah. My cat's actually summering at my in laws due to some,some rabbit killing that was happening earlier in the summer, which is thankfully now over. uh, But yeah, I'm on the opposite coast of the United States in Rhode Island. I'm looking at my basement office after my son was born last year and took my prior office. Decent setup though. Let's see. Oh, you know what? I have some stickers I just got from the EFF, the Electronic Frontier foundation. They did some like creature stickers, uh, so it says "online tracking stinks." it's a little alien, so I'll probably be looking at that for a little bit. [00:04:15] Madison Swain-Bowden: That's cute. What is Openverse? --- [00:04:16] Alan Levine: There may be a few people who don't know what Openverse is. What's the easy explanation for what Openverse does? [00:04:22] Madison Swain-Bowden: The tagline that I always use is, that we always use collectively is, Openverse is a search engine for openly licensed media. When I'm trying to explain to people at parties what it is that I work on, the question that I usually preface that with is, "Have you heard of Creative Commons? Do you know any of the Creative Commons licenses? That tends to diverge how I describe Openverse. For folks who know Creative Commons or are aware of open licensing, then I launch into like, we're a search engine that indexes a bunch of different sources that have openly licensed images and audio. Then we aggregate that and allow you to search across all of those sources. filtering by different license types, different parameters on the images and audio, that sort of thing. Um, And for folks who maybe aren't necessarily aware of open licensing and what that is, most folks understand the concept of something that's public domain. And so I'll say it's kind of like we have some public domain images, but also other kinds of images that are openly licensed. It's like Google Images, but with attribution. [00:05:19] Alan Levine: How did this project come to be? The Origin Story of Openverse --- [00:05:21] Alan Levine: What's the origin story for Openverse being this project that you're both working on? [00:05:26] Zack Krida: The project began at Creative Commons, obviously the creators, maintainers, enforcers -- well to some extent-- of the CC licenses. I believe it was, sometime in the mid 20 teens, Creative Commons received some grant funding for an initiative they wanted to work on, which was to find, catalog and make available all of the Creative Commons licensed works on the web, which is as it sounds a pretty massive undertaking. The latest estimates are somewhere around three billion works. They put out a prototype and then they ultimately launched CC Search, an early version of this search engine for images specifically with Creative Commons licenses. And in the spring of 2021, the project was, taken over by the WordPress foundation. We can talk about the, WordPress / Automattic relationship, whichMadison and I, and the rest of our core maintainer group work for Automattic. The WordPress Foundation took over the Openverse project, contributes developer hours to maintaining and growing the search engine, which implicit in what I just said was that it was renamed to Openverse from CC Search. Now we've been part of the broader WordPress ecosystem for three years now. [00:06:42] Alan Levine: And so brilliant that the WordPress Foundation took it on. I remember the very first one. It was just a thing of like little boxes that would search different sites. Yeah, you could find stuff, but it was a lot of work to do it. Now you go to Openverse and you see the search box, which we're pretty used to typing into. We enter search terms in a box and we get lots of results. in this case right now, images or sound files. can you sort of give the overall picture? Technical Insights: How Openverse Works --- [00:07:07] Alan Levine: What happens under the hood after I press the button, Madison? [00:07:10] Madison Swain-Bowden: This will be a little bit technical, right? I'll try not to get too deep into the weeds. Sowhen you execute a search, right,when you're searching for something, we have these big data sets, one for images which has Again, around, I think, 800 million a little over 800 million images And then another for audio, which is about three million at this point. And when you search for a term, we'll say "cat" since we're talking about cats earlier, we. have a application program interface or an API , that request gets sent off to And that will go and look through all of the records that we have that match the term "cat", and then bring them back to you. Matching that the relevancy there is, is a really, complex. piece. Certainlyit can be very naive or very complex. And we're working to feed results that are more relevant for the stuff that people are searching for. Behind Behind all that, too, is the actual data, and so we have to have something to search against, right? And so there's this whole invisible piece that happens with Openverse that's happening all the time. We are goingto these various sources that have Creative Commons works, and addingreferences to those in our, our index, in our data sets. And so some of our biggest ones are sites like Flickr or iNaturalist or Wikimedia Commons. We regularly or semi regularly request data from them what has been updated in the last month, what new records exist, what records have been updated that are, that are openly licensed. For some of the sources that we have, we just fetch the entire data set and update everything that we can. That data collection process is happening all the time. But it's more passively and in the background. We have things coming in from two sides. We have the data that is used to build out our indices that are searched against, and then the search terms that the users make for, for trying to find those images and audio. [00:09:01] Alan Levine: When we search with Google, we sort of have the illusion that it's searching everything on the web and we know that's done in a different way. But you're being provided information about this from different organizations, museums who are doing the collection. So you're kind of searching their index of their own stuff. [00:09:18] Madison Swain-Bowden: There's kind of two ways to think about this in the data space. There's like a push and a pull, right? So Openverse operates on a pull mechanism for its data, meaning that we are going out and we are requesting data from these places to add an update or index. The other way that could be done is if like the museums and whatnot were sending data to us and said, like, oh, "we have this new CC license and whatnot", but it's little bit easier for us to set it up by pulling that data. Because then we can set the regularity of that. We can set how frequently we want to do that. We We could monitor throughput to make sure that we're not overloading ourselves with too much data at any given point. Depending on the source that we have, that sort of process of fetching new data can be daily. It might be weekly or monthly and we have one just because it's a huge data set that we, that we only run once a quarter, I think. The Role of WordPress Foundation and Automattic --- [00:10:10] Alan Levine: And so we've been talking a couple of times about the WordPress Foundation and people may not even be aware that there's such a thing. What's the relationship to the Foundation and then the way the Foundation works with the Openverse team? [00:10:23] Zack Krida: Yeah, I can speak to that a little bit. So the WordPress Foundation is the nonprofit that supports the WordPress open content management system. Many folks who work on websites or maybe you work at a nonprofit or NGO and you have to update their pages or posts from time to time, and so, Openverse now falls under that organization as well. It's basically a small supporting organization. For example, we have the Openverse logo trademark on behalf of the Foundation and some small things like that. Like I mentioned, myself, Madison and six other folks now, are sponsored by Automattic, a tech company founded by Matt Mullenweg some time ago now. And Automattic is sponsoring us under a program called Five for the Future, something proposed by the WordPress Foundation. The idea there is that tech companies, organizations that use WordPress, that see a lot of value from the free software that is WordPress, that they contribute 5 percent of their workforce development hours, capacity, however you want to quantify that, to WordPress and the WordPress ecosystem. So yeah, it's really incredible to have a whole team built around sustaining this project and making sure that folks have access to all of these works for the foreseeable future. [00:11:44] Alan Levine: And so fantastic that WordPress sees the value in this and being the inspiration for the 5% model. And it does help WordPress, 'cause you go into your media library and you can actually use Openverse to find the images and it brings it directly in. In the old days you'd have to search for it, you have to download it and upload it. And now it's like making it easier as you're writing or creating in WordPress to get the media you want and then to bring along the full attribution. It's a logical thing, but there's much more that the world's benefiting from having Openverse. Team Roles and Backgrounds --- [00:12:18] Alan Levine: Can you each talk a little bit about what your roles are in Openverse and a little bit of your background? How did you come into this kind of work? I'll start with Zack. We'll flip it back and forth. [00:12:27] Zack Krida: In Openverse, my title that I use is Team Lead. Essentially I'm managing the folks working on the project day to day. Frankly, my job is unblocking people to do their best work. I, meet with the folks on the team weekly. We hold team meetings together to, discuss the project, our goals for the project, andjust keep our development commitments that we've made intact throughout the year. And also just, kind of connect the developers with other voices in the community, any kind of feedback requests we're getting, suggestions for the project. So yeah, really just gluing everyone together and, trying to help make Openverse the best it can be. As far as my journey to that point, funnily enough, I began my career in WordPress development, making. plugins, sites for various clients directly. I worked at some agency jobs prior to that. That was quite early in my career. A bit before that I was already very familiar with the Creative Commons licenses. As a teenager, I was really into the band Nine Inch Nails. If you are familiar with them, it's like industrial rock, I think is how they're usually classified.But in the early 2000s, they had this web community for remixing their songs. They would provide the individual isolated audio tracks or stems they're called And so so fans and their community members could just make remixes and those were given CC licenses I'd actually be really curious to go check where that archive is today. And if any of those still exist [00:13:57] Madison Swain-Bowden: Or if any of them are in Openverse too, that would be so fascinating. [00:14:01] Zack Krida: I'd have to figure out what my, what my username was. Gosh. But yeah, then, then fast forward to , 2018, I was actually working at Creative Commons on CC search as the front end developer. Good time to shout out my former manager, Chrissy over there. I was only working on the front end of the project, just the actual user interface of the website. And then as we're looking for a new home for the project, I helped get in touch with WordPress and here we are. [00:14:28] Alan Levine: And what's a typical day in your work life? [00:14:32] Zack Krida: Oh yeah, that, that's a good question. Um, yeah, so that's usually a couple of one-on-one meetings with folks on the team, checking in with community communication. We have a Slack channel where most of our interfacing with the community happens. Yeah, just, just checking progress on our code, open issues, change sets to the code that are being merged in, and checking in with the monthly, weekly, yearly goals of the projects to make sure everything's aligned. I usually take a few breaks during the day to either go rock climbing or just take a walk around the block, with my kids. [00:15:07] Alan Levine: I wasgonna, commend the open Slack channel. I started just dipping in there. It's a lot of technical stuff and I'm like, I don't know what this is, and I think that's where I first started communicating with Madison too-- very receptive to some basic like questions. What's your role there, Madison? [00:15:22] Madison Swain-Bowden: I'm a data engineer by trade, uh, and,data engineering has, has been my history in, in the tech space since kind of since I started. But, speaking to Zack's role a little bit, I stepped into those shoes and when Openverse won the Open Infrastructure Award,I wasin the role of team lead since Zack was on parental leave during that time . It's a lot of work during project management and team management, and I just want to acknowledge that But as I said,a data engineer, you know, there's there's seven of us who are sponsored to help work on Openverse and, uh, for a search engine. um, seven is a very small number of folks to work on a search engine, right? My background's in data, but each of us do a little bit of,everything on the team. I've done a lot of,infrastructure work and, um, work on the API and on searching . What a normal day looks like for, for, myself. It's, um, yeah, checking in on, on PRs, the community, contributions that receiving. I do a lot of, community engagement just the various, um, groups that I'm in for, open source contributions,. That's part of how I got into Openverse, too, is I have been doing Open Source for as long as I have been writing software. Um, I say that I eat, sleep and breathe Open Source. I would be doing Open Source in my free time. And so it's a joy to be able to do that professionally as well, to be sponsored, to be, to be an open source maintainer. And so seeing that opportunity with Openverse and sponsored by Automattic was, a really incredible opportunity for me to have that be my, my full time work. I don't have a whole lot of experience with WordPress or a CMS or even web development in general, but, um, Openverse was in need of a data engineer and I wanted to do open source work and professionally. [00:16:55] Alan Levine: Commendable. The whole ethos is definitely a reason why this award was given to this project. Openverse and Artificial Intelligence --- [00:17:01] Alan Levine: And, uh, we'll toss in the topic of the current day of Artificial Intelligence, like it's everywhere. I know Madison and I had a little bit of pre- show conversation. We're talking about some efforts to, like, work with the of results. But, um, where does Openverse and AI, um, what's going on in terms of the thinking about that or how it might benefit or, where is it in the mix? [00:17:24] Madison Swain-Bowden: Yeah, it's a good question. It was very, you know, topical for our current moment. And, um, last year we had some pretty significant discussions because, Openverse has like a huge data set, right of Creative Commons licensed images and audio. That can be a desirable thing for model training. We had some discussions on the team about like where it is that we wanted to put our efforts and the vision that we saw for Openverse. Given the size of our team, given what we're hoping to accomplish, we really came out of those discussions with the perspective of Openverse is going to be something that's for humans. We want humans to be the ones to use this project and to be the ones who are searching for the content that is made and created by humans as well, and is licensed, by those creators. That ended up being our, our focus. We took a step away from building the dataset to trying to build a better project that would that would serve the needs of the folks who are, you know, typing Openverse dot org in their web browser trying to find images that other creators made. [00:18:24] Zack Krida: I was just going to add that a fundamental position we take is that a lot of the ethical concerns around AI are still, frankly, open questions in many respects. I personally don't find the answers to some of those questions promising quite . . You know, with the CC licenses in particular, the obvious thing is attribution. For example, if an AI image is based off of, 2000 Creative Commons licensed images that require attribution. You know, as it stands today, the, the AI tools don't give you that list of 2000 works to cite and at and apply attribution in your own project. Frankly, it would probably be ridiculous to do so. Yeah, there's, there's just a whole host of questions about how to uphold the values and promises of these licenses, to the creators of these works and reconcile that with this technology that, you know, frankly, didn't exist and wasn't even in the kind of collective imagination when these images were licensed. Some of these audio, song generation tools I've seen, for example, very clear videos where a particular singer's voice is being replicated one to one by an AI tool that claims to not have been trained on, copyrighted music,. We're taking a bit of a cautious and skeptical approach. We definitely seek out and are open to, guidance here from, Open license, open education, community. We're looking forward to hearing more and seeing more standards set in place for for how to deal with this. [00:19:54] Alan Levine: And certainly, in our community, it's challenging. It changes a lot of our perceptions and, understandings, what a copy is. But I can see at least if you're talking maybe about, um, using,some kind of large language model to handle the query, so you might get better results. At least the things you're training it on are not as vague as the big providers, because these are all the source collections that you document. The licenses are very clear that they're in there. Um, but yeah, interesting to think about what it means to repurpose all that information, through this technology. [00:20:30] Madison Swain-Bowden: One way that we have worked and will continue to work for is improving search relevancy, using certain tools like machine generated labels. So you might imagine someone has a beautiful picture of, like, 2 kids playing on a beach, um, they've uploaded that to Flickr with the CC0 license. It's a great kind of photo to want to use on a blog. But maybe the thing that they uploaded it with is, like "Sam and Elise in Florida". That's the title, right? And so if we're just going by text, all right, we have this beautiful picture of two kids on a beach. But the text that we have to search against is "Sam and Elise in Florida". And so one way, uh, that, that machine learning AI can be applied to try and improve results there is pull out the content, the text, the semantic content of the images. And be able to search the text of that. So like the machine labeling might pull out, oh, there's children in this photo, there's a beach in this photo, there's water, um, there's sunshine. And so if you, you know, type two kids playing on the beach as your search, then that would be the image that comes up. Whereas if we were just doing text based analysis, then, it wouldn't show up in the results there. [00:21:37] Alan Levine: Flickr. I've been wondering for a long time why there's no ability to have alternative text, Like you say, you just get the title that someone posted. And often it's like, you know, PJ96 dot JPG. [00:21:50] Zack Krida: Yeah. Especially for some of these older archival works that we're really interested in preserving despite, obvious changes in like camera quality over the last decade or two or three at this point.. But yeah, if something is just the, the automatic camera name or, uh, like Madison said, something a bit more personalized, it doesn't convey all of the information.Yeah, it becomes a lot harder. And just quickly, in the case of audio, I may have actually mentioned this a year ago with you, Alan, but, most audio communities actually do a great job of tagging and categorizing their own works. I don't know if that's just innate to some of the complexity of working with audio where you have, you know, bit rates different formats and, you know, might be working with microphones. They do a really good job of tagging genre and other descriptive fields. So, material can be a little harder to classify with, uh, machine learning. Maybe at the end of the day, it's kind of a, a net balance between the two. But, it is interesting to think about those two different media types there. [00:22:53] Alan Levine: I'm just glad to hear that this is factoring into your thinking and all the on top of all the other things that you get to do. Recent Features and Improvements --- [00:22:59] Alan Levine: So what are some key things that have been added to Openverse, feature wise in the last year? [00:23:08] Zack Krida: That actually segues-- what we were just talking about segues nicely because uh, one of the biggest things was, making Openversesafer , in an educational context.So, work largely around sensitive content, which frankly is anything that could be considered sexually explicit, culturally insensitive, violent. I guess the simplest model might be anything you wouldn't want a child to see by default. And obviously there's many ways tocategorize what "sensitive" is. To that point, we like the term sensitive because it's largely up for interpretation. To that aim, we've taken strides to identify more of and classify those works. We do that right now with some simple text matching against a list of sensitive terms we maintain. Uh, it's a pretty exhaustive list. There's lots of prior art there to reference from. We've basically combined a few popular lists,and doing that alone made a pretty significant impact. Then on the website itself on, Openverse. org, if you search for something, the sensitive results will be hidden by default, and then you can opt into seeing them. Initially the results will be blurred, and then you can click on a result, you can look at some of the text metadata that exists, like the description of the work, some of the tags, uh, decide if you want to opt into looking at it or not. So it's a nice way to not censor these works but still protect people from seeing them by default. It's a really similar pattern. Funnily enough, Google images adopted a very similar model shortly after we did. I'm sure they're being developed at the same time and it's not a direct inspiration. But, just that those initial efforts were pretty exhaustive and have improved the problem a lot. , we had educators reach out to us in the past saying that their school had to block Openverse with their internal filtering firewall system. And some of those institutions were able to restore access, which is great. We get a lot of users who are students and educators, it's a huge, important audience for us. We really wanna make it as as safe as possible for them. [00:25:20] Alan Levine: Is there an equivalent thing you can do for blurring for audio? Do you prepend with like a warning ? [00:25:27] Zack Krida: That's a good question. What we do is we blur the text description and title of the audio. That, is a fundamentally interesting thing because you can start an audio track. And you might not like what you're hearing , you can pause it, and like, immediately revoke your access in that way. But yeah, the other challenge there is that, again, we're matching against text, not the actual audio itself. Again, to what Madison was talking about earlier with machine labeling, a key next step here will be to actually use these machine learning models to analyze the images and they can flag things like nudity or other kind of vectors. And then we can ideally just have a, much more complete thorough comprehensive, marking of these sensitive works in the future. I'm really happy with where we're at, but we're constantly taking in millions of new images and audio files, like Madison said, on a weekly basis, a monthly basis, a quarterly basis. The health of this datasetreally depends on constantly keeping it up to date. [00:26:31] Madison Swain-Bowden: We've talked about our own hesitancy with building AI data sets out of Openverse, and yet we're also, you know, talking about leveraging AI sort of on the other hand. I think the one important factor in the mindset that the maintainers have for the project is that in the places that we are leveraging machine generated labels, we're making that very clear. We're conveying, like, "these are the labels that were added by the user originally when this work was created", and "these are the labels that were added by, a machine labeler." And ensuring that again, the humans that are using Openverse, know what's going into it, know what's happening in the background. I'm sure as we start to apply that to the question of sensitivity, that we're going to take similar care around those aspects as well to make sure that users are well informed with what Openverse doing on that front and how Openverse is leveraging this data. [00:27:21] Alan Levine: All right. So when you get to use Openverse, from the front end, what are some of your favorite features or capabilities? [00:27:26] Zack Krida: Oh, that's a great question. [00:27:28] Madison Swain-Bowden: This works well, too, for the last question that you asked about new features and improvements, because, one of the big things that we launched was, our collection pages. Now, when you visit an individual research result, we have links for tags, creators, and sources that are on that page. And so if you go to a page and you think, oh, like this person who took this photo is fantastic. I want to see the other photos that they have made that are in Openverse. You can click on their username and go a page that has all of the results from that source and that creator in one page. And, if you're looking for posts that are a content that's under a certain tag, we have pages based on those tags. And so if you find an image that you like, find a tag that you like, you can click on that tagand browse all the images that are under that tag across all of our sources. I love this, as a way to just browse the content that Openverse has, in a different, in a different form. If you find a source that you really like, then, we link to that upstream source and you can go there directly yourself and see everything else. [00:28:30] Alan Levine: I can imagine that the managing of various tagging schemes is pretty complicated, right? [00:28:36] Madison Swain-Bowden: We're working through some very interesting issues there. And then there's, of course, even something as minor as capitalization, right? You might think of with the tagging. To bring it back to the cat example, my wife just bought a shirt that says CAT on it, but it's the construction company, CAT. That's in all caps CAT, right? And that's different That's a different kind of thing that you're searching for versus pictures of the feline. And so it, yeah, brings about. all sorts of different questions and things that you have to think about. [00:29:04] Alan Levine: I may be weird, but sometimes I don't mind if I'm like, searching for pictures of cat and all of a sudden a tractor appears. It makes me laugh, but it's not relevant. And I'm an edge case, so don't design for me. [00:29:14] Zack Krida: I also like that serendipity that sometimes happens and, you know, frankly, if you're writing a blog about, cats or something, maybe an ironic image, you don't always want to be completely literal. That could be a good way to justifyoccasional less than ideal results. Did you have any other features you wanted to throw out Madison? [00:29:32] Madison Swain-Bowden: Yeah. This goes to the core of Openverse. It's one of the things that I really love about it, is the one click attribution that we have. I think that that was mentioned in the awards text that we got from OEG last year as a big thing to point out. And the uniformity across all of the results that we have, in that regard,is really our strength. If you go and find an image or an audio track that you like, there's a box that we provide that has attribution that you can just click to copy. Another thing that we added this year is HTML, the plain text and then rich text. We also now have the Dublin Core structured XML, as well. So, we have attribution XML based attribution that you can copy from there as well. And so it's, yeah, really slick [00:30:13] Alan Levine: That is huge because that was always a big burden to get educators to do a manual attribution statement. It was like, you know, four trips to the source to copy paste. That's a huge feature. [00:30:25] Madison Swain-Bowden: Exactly. Yeah. And then you have to look up, how do I structure this attribution, like what comes first and like, what am I hyperlinking, where? It obviates the need to do any of that and just gives you gives you the text itself. [00:30:37] Zack Krida: Before I even started working at Creative Commons, the first feature I actually added to, then ccSearch, now Openverse, is when you click the copy [00:30:46] Alan Levine: text button for the attribution, the text changes to say "copied" with an exclamation point. And yeah, that was me, however many years ago now. That's good. Every time I see that, I'll say, thank you, Zack. It is important. [00:30:59] Zack Krida: That's actually a good to shout out our designer, Francisco, who is just constantly thinking of very thoughtful touches to the user interface, which, you know, we try to make both accessible as possible, but then also have our own visual language. Unfortunately, our search is so fast that you don't really get to see it. But, um, when you make a search in Openverse, the logo of is actually a little animation. The circles in our logo, kind of flow and contract and expand. But yeah, you have to search for really obscure things to slow down the search and actually see that. [00:31:33] Alan Levine: I'm going to have to try that. [00:31:38] Zack Krida: One thing I wanted to flag that I think is somewhat underappreciated with Openverse we are fully translated into about 19 different languages. So that's the entire user interface, all the text on the site itself, and then in various stages of partial translation in 40 additional languages. Those are all those are all human generated translations by community contributors in the broader WordPress community, which is really remarkable. I look at other products like this and they. at most have one or two languages. Yes. You go to any stock photography image site. So I think that's huge. We have users in 120 countries, I think is the last number I saw. So being able to offer them, searching in their preferred language is huge. We would love to take that further someday. Translation of search terms andthe result text itself is a huge challenge that would be interesting to try and tackle. That's a whole can of worms where you get into translation on a very complex level. The one other thing I just wanted to double down on is our filtering system, really a huge thing with Openverse is the multimedia nature of it. By default, you see images search resultx. Seeing them mixed in the way we do is something really don't see on other platforms. But for us, it's a really nice way to highlight the richness of the open license ecosystem, that there's images and audio, this whole whole tapestry. And we want to expand to different media types in the future. Once youdial down to images or audio, we have a really rich filtering system where you can just, you know, choose which of our, numerous sources. I'm looking at the list right now and it's constantly growing. But, you can dial which specific sources you would like to search from, , obviously which specific license suits your use case. We try to do some education about the specific licenses. You know, as soon as you ask folks to understand legality and licensing, it's a pretty complicated subject. So we try to make that as straightforward as possible. You can also just look at images of different aspect ratio, sizes, file types. You can differentiate between photographs, illustrations, digital artworks. I think this filtering functionality is really what lets people find the right image for their use case, their project, and reallycurate the results they're seeing, beyond us just showing,the perfect results as soon as you type in your search term. So there's a lot of power there. [00:34:16] Alan Levine: That's a hard balance to add all those features and not have it be too complex for the interface. I know a lot of times I only want landscape images cause I'm looking for like a header image and at least you don't have to do that filtering after you see all the results. That's really powerful. And also to zero in on specific collections because there's some unique ones. I know for images, like things get, lost in the wash about how many images are on Flickr, but it's really interesting to search within the very specific collections. [00:34:45] Zack Krida: And I'd love to figure out ways to just foster a sense of play and discovery with these things. Like, some of, the sources that you might not even think of have some pretty extraordinary results. People come to Openverse with different needs. We have people looking for sort of traditional stock photography for a blog post header. We might have someone looking for a very specific historical image for a homework assignment. We can't make these assumptions by default of what relevancy is to all users. And of course we can always do a better job of this, but that's where the filtering is really important. One idea I've had for that aim is to kind of have these like filter presets that, that users could choose from which would essentially just be a drop down of different use cases for Openverse that automatically picks the right filters for what you're describing, , or what your use case is rather. For example, a stock photography preset, or a research preset, these kinds of things. [00:35:41] Alan Levine: Wow.I could come up with hours of questions to ask you both, but you've been very generous here. User Stories and Feedback --- [00:35:45] Alan Levine: But just to finish, you talked a little bit about the users. Do you ever hear from people , and they give you their little story about what they found or what they discovered? And it says to you, this is why we do this. [00:35:57] Madison Swain-Bowden: I have a story like that. I, attend conferences semi frequently, and, part of those are talks. As an Openverse maintainer, , I'm on the lookout for okay, who, in their PowerPoints, in their presentations are attributing the images that they're using. I went to a speaker's talk,and they were presenting on stuff and I could see that they had the little attribution right below the image. I went up after the talk to chat with them and said, "Hey, do have you heard of Openverse? Do you know, do you know about this?" They said, "no, like, I haven't heard of this thing before." And so I described it and I said, " look, you can copy the attribution here. This makes it very easy." They were using Wikimedia Commons exclusively for their photos. But I said, look, you can filter by licenses and do all these things. And, their response was like, "This is exactly what I need. Like, this is what I've been looking for. This is going to make my life so much easier." There's someone who gives talks a lot and wants to be proactive about attributing those images that they find. And it was like, oh yay, this is the perfect use case and, you know, I made a convert. [00:36:51] Alan Levine: Anything, Zack from your side? [00:36:53] Zack Krida: It's funny that you mentioned that because, earlier today we have a feedback section on the website on Openverse. It's a little rudimentary right now. It's like a Google form embedded in the site. But I was looking through some of our recent submissions today and something that struck me as funny was that a bunch of them were written by school children, just based on the use cases they were describing. The writing style, uh, felt clear that they were children. just saying random nice things, that they, they like Openverse, that they use it for their projectsand it worked well. Not the most actionable feedback, but still, great to receive and just, validate the work we put in. Conclusion and Acknowledgements --- [00:37:28] Alan Levine: A big thank you to Madison and Zack for taking this time. Like I said, I get lucky. I get to have all these conversations. and we also like to make sure that we let people know that they're real people behind the scenes. With software, you tend to not see that or know it. The humanity is coming through. So acknowledgement to you and the whole team. And I'm going to thank people who are listening to this episode.This is again, OE Global Voices, the podcast from Open Education Global. Usually for an episode, , I try to find a relevant track from the Free Music Archive cause I just love their ethos, but it seemed appropriate to search Openverse for some music. So I typed in "infrastructure" and used the filter for music. I got It on the first page of results. It took me like 30 seconds. [00:38:10] Madison Swain-Bowden: Incredible. [00:38:11] Alan Levine: It's a track by an artist named Anitek, licensed Creative Commons, Attribution, Non- Commercial Share Alike, and I'll be able to copy paste that attribution when I write up the blog post. You'll find this, when I get around to editing it on our site, voices dot oeglobal dot org. We sometimes get follow up conversation in our OEG Connect community. And if you just are listening, let us know something interesting that you found or your own Openverse story. We'd be keen to share that with both Madison and Zack. So again, , I humbly appreciate the work that you're doing, the WordPress Foundation for taking this on and adding such a beautiful resource for open educators, And, with Madison's friend there, we salute all the cats out there and people search for cat images. [00:38:55] Zack Krida: Yes. Awesome. [00:38:57] Madison Swain-Bowden: It's been a pleasure. Alan, thank you for having us on.