Skip to content


MusicBrainz founder Robert Kaye took some time out to chat with me about the project, and I hope to post some bits here with his permission (and as I slog through the miserable task of transcription). I plan to use this interview to revise the documents on the project’s history.

[begin transcript]

J: The course overall is trying to understand what these open source projects are, what kind of people contribute to them, WHY people contribute to them, and so with MB in particular I think it’s really interesting because on some level you’ve harnessed music fandom. People are really passionate about music, they love to give their opinion, but not just in an editorial way, but in an “I know this fact” kind of way. And so I think that’s really interesting and also I’m just a huge music fan myself so that’s why it’s an interesting project for me to contribute to.

R: Should I start talking about that sort of cycle and what sorts of people are attracted to the project?

J: Yeah, that’d be awesome.

R: Okay. So one of the ways that MB works and why it works is because we end up closing this, what I call a “metadata loop” and it’s a virtuous cycle. So what we do is— do you know what acoustic fingerprints are, that we use from a company called Music IP, well it used to be Music IP. Well, they’re very critical towards closing this metadata loop. So the idea is that bootstrapping in the beginning was particularly special, but once we got some sort of critical mass you should define our value proposition for people that have music collections that are crappy, that are badly organized. The value proposition is that they come to MB and they can take a thousand hours of work out of the cloud, out of the system, out of the commons, whatever you want to call it. And maybe contribute an hour back. The way we do that, and the way we actually encourage people to contribute back, is that in our tagging applications we don’t make it easy to go and make changes to your local system. You have to pretty much go make changes to MB before those changes will go back to your local system. So you could use another tool for cleaning up your music collection bits and so forth, but really the path and the workflow is more essentially that people will go and contribute their information back to MB. And you know people are generally willing to do that because if you take a thousand hours out of the cloud and you’re giving one hour back from these little bits of pieces like “i know this, I know that, I have this, I can contribute this, I can type in the barcode for that,” that value proposition really works for people and people get pulled into it.

R: If I could characterize our community a little bit, it’s sort of like an onion. One of my advisors who knows lots of things about social interactions and how people build communities and so forth—he likened it pretty much like an onion, where on the outside layer you’ve got a very transient community that— these are the people that are coming in and realizing that they have an iPod and they can’t find anything on the iPod because they downloaded all this crap off the Internet and they can’t find anything. Now the iPod is supposed to be the greatest thing since sliced bread, and they’re having trouble really making it work for them, so they realize they have a metadata problem, and they come to MB. The outer layer of this, people who may not really contribute much back and they clean up their collection and spend a few hours doing something, hit some sort of critical threshold like “Hey the iPod is not working for me” and may never come back. The next layer in are people that look at this and are like “Hey, this is pretty cool, I’m gonna go spend a load of time doing this.” And they may even put a few hours back, rather than just one or nothing. And the next layer in are people that just get kind of hung up on this, or really fascinated by the concept, like “Oh wow this is really neat, and I have these five artists I’m really passionate about, and I’m gonna dive in, I’m gonna adopt these artists and make sure that all of their music is properly covered.” And they’ll go in and put a lot of energy into it, even without necessarily knowing anything technical. And the next level in from the onion is a group of people that, having done that, they get pulled into the community one way or another. It could be social reasons, it could be like a sense of belonging like where a dating site might not be for them, but they find a sense of community inside of MB and they may fix documentation, they may organize our bug collection, they may help us with site design/graphics, generally non-technical type of stuff. And then other people get pulled in that are hardcore geeks that write code and do all sorts of contributions, and that’s getting pretty close to the inner core of people that are spending a lot of their time on MB, a lot of their free time and so forth, and in the end they’re also the ones that are likely to get hired into MB. And then the innermost core has got to be myself and the people that are hired for MB and our board of directors. So the amount of time one spends working with the project goes up drastically the closer you get to the center of the onion. Does that analogy make sense?

J: Yeah, that totally makes sense. I’m actually particularly interested in — you said “adopt” artists, which I think is a really interesting way to think about it and put it, and that’s actually sort of the way I thought about it when I looked at the subscriptions. In terms of the way that those layers of the onion are dispersed, would you say that people who adopt a couple artists are—maybe there are more of them than other layers? Or do you find that the transients are—like where is most of the population sitting?

R: I would say that we have more of the transients. Those are ranging in — it’s hard to judge exactly how many people we’ve got — but I’d say there’s a range in hundreds of dozens of people — hundreds of people or dozens of people that make changes in any given month, and the people that adopt artists, they’re —those numbers are fewer because they’re one level into the onion and they show a little more dedication. Those people tend to be — you know I also have a friend who also was a contributor for us on the code side who is a big Tori Amos fan, so every piece of information that we could conceivably gather about Tori Amos we have. We’ve covered. And people get completely nuts about it. I should be doing that more for my artists but I’m sort of a bad user of my own software. I’m the leader of it and so forth but magically it doesn’t really apply to me because it’s not really my cup of tea. My cup of tea is doing crazy things like this, not really using them. Interesting anecdote: There are some people who come in and will look at the privilege I have in the system and the amount of editing I’ve done and they get up on my nuts, like hey how come you have all these privileges la di da da da—well, I start to say, I need these privileges, I run this thing and they say “oh, sorry.”

R: But there’s one story I want to share with you. I don’t know if this is terribly interesting or terribly relevant but I’m going to go start out really short and we can delve in more if you want to. But this is the one thing as far as community that just blew me way. I just did not see this coming. A few years ago—probably about 5 years ago—this person named Moe joined our channel, and you know, this person was sort of very impulsive and they were doing a lot of edits and database and so forth and back and forth and it turns out that this person has Asperger’s syndrome and this person is in Oslo. And this person doesn’t really have a normal life. They’re most likely not ever going to be holding down a job the way that you or I might hold down a job, or have friends in real life the way you and I have friends in real life. So there’s — so this person is really isolated, with their disability makes it difficult to fit into any type of social circumstance or social group. And this person has in no time become a very active and useful member of the MB community. So really what happened is that MB actually now gives a greater sense of purpose to people and a greater sense of belonging that they may not have ever had in their lives before. Which is really kind of a strange concept, and I’ve watched this person go from “Hey wow I’ve never been into the city center of Oslo by myself” to having actually joined us in international travel and gone from Oslo to London to meet with us in our summit. And I actually participate in our summits. So this person has gone and made these drastic changes and drastic improvements in their life because of MB. And I just didn’t see that coming at all. But there’s lots of little things like this that I didn’t see coming that end up working out. So obviously this person is very much part of our core and I kind of — it’s just amazing to me that my project, this thing that I got pissed off because Gracenote did something, in the end ended up changing people’s lives. So as far as community is concerned that’s on top of the list of “Wow, I didn’t see that coming.”

J: So would you say that the community is tight-knit?

R: Yes. I think we’re a little more tight-knit than others because — so if I really rewind this all the way back to 1998, I originally started this project because Gracenote—at the time they were called [unintelligible – look up!] — CDDB project — I typed in probably about 200 of my own CDs and magically, when they took to private, I didn’t get a check for my efforts. And I was really pissed about that and just grousing to a friend and he said “Well shit, why don’t you just start your own open version?” And I created a project called the CD Index. And it was basically a non-compilable—so I couldn’t get sued by Gracenote—a non-compilable version that uses better technoogy and so forth and I built that. And right around that time Slashdot, the geek news site, someone asked the question “So what is the open source follow-up for CDDB now that it’s closed?” and I had been tinkering on the CD Index, I was nearly ready, so overnight I finished it up and the next morning I got posted on Slashdot and a whole bunch of people came, literally from one moment to the next. My server crashed, overnight I had 3000 users and 10,000 entries into the database in the time it was picked up. So, I was new to open source, so I didn’t have mailing lists, I didn’t have this, i didn’t have that, I didn’t have things that I didn’t know I should have. I mean 1998 was also still pretty early on in the whole open source game, so what happened is that this community of people that came from Slashdot — I would characterize them as a bunch of screaming bitches. It was just a lot of screaming, a lot of self-righteousness, a lot of demanding. Some kernel hacker showed up and came up with some really whacked out ideas that were not really appropriate. And it was a community of just shouting, disrespect, and not really getting anywhere. And as the project just matured on it was this community that I just didn’t really like and was really kind of pissed off by. And I looked at my life and looked at the thing I wanted to work on, and said you know, this one has the most potential to be something interesting. I knew that CDs were obviously not the future, and MP3 or digital audio was going to be representing the future, so I am going to rebrand this as MB and turn it into a music encyclopedia over the next few years. So that was my goal, I started doing that, I started changing the site and so forth. When I actually launched MB I was still working with part of the Freeamp team which [unintelligible]. And what I did was just put a little button on the Freeamp site and said “MB, we’re collecting music metadata.” And slowly, people came. And they started trickling in. Now, I never told the old CD Index community that MB was happening. I just let the wave follow and just walked away from it because that community was just a bunch of screaming bitches and I want to have a new one. The new community, the people who came were honestly curious. They wanted to look at it, they wanted to play with it. I was completely on it and I had more of my open source chops, I had things laid out properly, and most importantly, I set the tone of respect, and a comfortable tone. So when people started coming onto the mailing list and they started mouthing off and talking and being *** and are just disrespectful to others, my community now knows that we are a respectful community. I don’t have to say anything because half of the community will say look, we are respectful people here. Compare that to say, kernel developers. The Linux kernel developers are kind of brash. They can be really, really — use strong language and be very off-putting to women. So that’s generally something I wanted to fight against. We just had a summit in Germany and we had a third women there, which makes me really happy even though it’s still a very sad number, but if you look at computer science courses and how many women are there, you should consider yourself lucky if you have one woman in a class of people. It’s really sad. So I was actually particularly pleased, because that was representative that our community was doing something right, and going in the right direction. I think the key there is that setting up a community and setting the right tone for the community is really important from the get-go. If you don’t do that then the community you get may not be one that you actually really like.

J: On an anecdotal level, I’ve participated in online message boards and forums for a long time, and there was one started by the the guys who started Threadless and this was in like the early — 2000? 2001? But it was a great, close community, people respected each other, people knew each other—whenever you were in someone’s town you could get a drink with them and it was no problem, it wasn’t weird or creepy or anything—and suddenly the dynamic of the community changed drastically. People started filtering in, sort of like what happened with Slashdot. It’s the same kind of people who just have no respect for the community, who have no idea of how to act. So we migrated. We moved to other private message boards split off from there, and a couple more from there, and so the community has become really segmented into this different boards. Meanwhile the old community is still there, and it’s actually really interesting because one of the original founders popped up on one of the private boards recently and asked what we thought about him trying to fix Yayhooray and put it back together and it was one of those just leave it, you can’t fix it.

R: Yep, I agree.

J: And so I think that kind of dynamic in online communities is really interesting. But that’s just an aside so… You said a bunch of really interesting things in there and one of them is… as a woman in a technology program, it’s a really interesting point that there are very few women in tech programs, we have 40 ppl in my class and there are 11 women.

R: Wow!

J: It’s a good ratio actually I think but it’s still pretty…

R: Well it’s obviously off still but more than 25% is nearly unheard of. That’s impressive.

J: And I mean I think it’s obviously something Berkeley does on purpose. But so — I’ve never actually THOUGHT about women in open source communities being a minority but I guess it makes sense. The women that are working on MusicBrainz, are they doing programming? Are they contributing metadata? What roles do you find them playing, are they kind of a mix of everything?

R: Unfortunately, we only have one woman that occasionally writes some code. More the women for us are in sort of supporting roles in helping us keep track of our bugs and helping us write some documentation and so forth. They’re very valuable roles, but sadly they’re not hardcore geeky kind of programming roles. So we’re still kinda falling short of that. But I think in general that we actually present a community that is not and — if you remind me, we’ve got a couple of references at the latest OSCON that happened in July of last year that were actually a bruhaha about women in open source that just devolved into this nasty discussion. It might be really interesting for you to read, and there’s a person by the name of Keirin Robert (?) who works for MetaWeb in San Francisco or I think she works for MetaWeb. But if you would like to chase down this particular topic, she would be a good person to talk to.

J: It’s not a topic that I’d even thought about but it’s kind of an interesting one now. I mean the cool thing about this class too is that I basically decide what I want my project to be so that’s nice. So when you chose the licenses for MB, what was your — I know that there was a little bit on the website about that, but can you tell me a little bit more about how you went through that selection process?

R: So the selection process for the licenses on the source code was kind of a no-brainer. GPL for the source code and then LGPL for library stuff. That’s really a no brainer, and I don’t think that’s really what you’re interested in. I think you are talking more about the data licenses right?

J: Yes.

R: So I started out with the Open Content License, which was mmm about 1999 or so, and it’d never really gotten that much adoption. And right around that time I was really thinking well, what can I do here, what do I need to do because there aren’t really any good data licenses out there. I literally started the search of looking at data licenses. And I had a very fateful discussion with Richard Stallman, and a very final closing jab was like “Just use the GNU free documentation license.” And the free documentation license, I looked at it and I read it a couple of times, and compared to the GPL it is just, it’s an awful license. And I’m looking at my database and trying to find where the front matter is, for dependencies I still haven’t found them. So it’s a really bad match. So I said hey, can I take the GPL and come up with a data license. And he made it very clear that the text of the GPL was not available under the GPL.

J: That’s awesome.

R: Yeeeah. I was very frustrated by that. But the motivations for changing the license were that in the United States there is a Supreme Court case called Feist vs. Rural Telephone Company.

J: Yes.

R: And so here is where without limit basically established that facts are not copyrightable. So my competition was doing this, and I was sort of de facto doing it, but I didn’t really like that idea. So I started looking at it and it became clear that having core data in the public domain was going tobe crtiical because I really couldn’t defend it in a court of law, I couldn’t slap a copyright on these facts. So I wrote a white paper—because when you change licenses on an existing project that’s a really big deal—so I took it as a big deal and wrote this white paper and laid out all the arguments and basically put it to my community and it was either a non-issue or they like the work I do because nobody really had any objections to me saying “Hey, let’s go to public domain”. The whole public domain bit was also — there were two motivations. Number one, I really wanted to do an open data kind of approach to this, and I really wanted everyone to be able to use the data. So the number one mantra I had was “There shall be no barriers to access to the data.” So it should be really easy to access the data, and we did a lot of work to make sure that people could access the data, download it, and that that process always worked. So with that, one of the basic tenets of the project was very much free-floating, trying to find a solution for this. Because public domain only covers facts. The other portions of data that are copyrightable, that are created by our community — and I was looking at what license to cover that. And magically, out of the blue, Creative Commons, before they launched, they contacted me and said “Hey you are doing some interesting stuff, you might benefit from our licenses.” And I had literally one of these angels that are coming down from heaven and singing to you kind of moments. They answered all of my questions. And lo and behold, they were just received with wild enthusiasm, so all the sudden I was adopting licenses that were very popular. So for me to make this change and say hey we’re going to go to public domain and Creative Commons licenses was easy, which was rather surprising.

J. Wow.

R: Now, this process of license selection is drawn out, it took me probably 2 years of thinking and background processing and so forth to even come up with a proper solution for it. And part of this proper solution was, I want to create a non-profit. I couldn’t create a for-profit because of Gracenote. The environment just wouldn’t be good, they essentially pissed in the gene pool and here I am floating in their pee and I can’t do very much. So I decided to go the non-profit route. Now, I’m kind of a businessman at heart when I’m not a geek, and I said that we need to find a way to earn money instead of beg for money and still do it in non-profit fashion. So I did the legwork for that but the key was how am I now actually going to make money with this. So I started looking at business models and started going to a lot of open goal session and caucuses and so forth. And one of the things I finally realized after a hard year and a half of hardcore looking at business models and so forth is that wow, I’m sitting on this data that changes very rapidly, and I ca be the gatekeeper and I can actually charge for timely and convenient access to the data. I’m not charging money for access to the data, just timely and convenient access. So with that I said we’re already dumping the data twice a week and you can use the entire data snapshot, but it takes probably about an hour or so to load the data into your system. So if you’re running a website that has to be up 24 hours a day, you can’t take your system down for an hour and load the data. You can use another database server and load the data while the other one handles the traffic and magically switch them out, but it’s very inconvenient. So what I offered was what we call the Live Data Feed, and the Live Data Feed allows you to set up your own MB server and then you can spoon feed the server changes to it on an hourly basis. So you download our software, our source code, and then say “ok run this once an hour” and it fetches all the changes from MB. So any replicated mirror will never be more than 70 minutes out of date with the main server. And that is a value. So if you’re somebody like the BBC or Last.FM that really needs to have access to this data continuously and seamlessly over time, they see value in this. And that’s where they’re willing to plunk down $1500 a month to have access to the service. And all we’re doing is delivering the data in a timely and convenient fashion, we’re not really charging for the data. And that was really — that was a breaking point, that was it, once I actually realized that I said well I can build a business off of this. And so far we’ve been in black ever since we created the non-profit and have made more money than we spent. So that was really absolutely critical for making this work because, you know, you gotta pay the bills. And even with a fairly small project like this there still are bills. We’re probably just spending $2000-$3000 in just hosting costs and other lower hanging-type things in a month, not even counting salary. So that was really critical to find that and really bring that home and put it in such a way that the community didn’t get pissed off by it. Because really what we’re doing is we’re now making money off of the backs of all the people that are contributing in this peer economy. And you have to make sure that you aren’t pissing off the community in the process of doing that, and that’s why the non-profit and doing it in such a way that anybody could still download the data, tinker with it —but if you want these updates you need to pay for us. And we chose a CC non-commercial license which means that anybody can use these live data feeds as long as they’re not doing anything commercial with it. You can have your own MB server be up to date no more than 70 minutes out of sync with the main server, have at it. So with that level of freedom the community was happy, they’re totally cool with it. And since we’re hiring engineers out of the community it just works really well and we don’t really have any misgivings about what we’re doing, or the community doesn’t actually have any misgivings. I’m particularly proud of having navigated that, what I consider a very challenging sort of preposition.

J: Yeah I mean it sounds — it’s kind of straight out of the GNU manifesto, you’re not selling what people have actually contributed, you’re selling a service on top of it.

R: Absolutely.

J: The CC licenses are very interesting because, I mean a lot of the common critique of them has been like, well, they’re not perfect, but they’re kind of the best thing that we have in terms of stuff like a giant collection of user-contributed data. Is there anything that you can see in a license that you would love, maybe something that allows you to protect not only the actual code but also the database as sort of one big entity.

R: You know, actually I’m really happy with how the licenses are right now. I personally don’t have any beef with them. I particularly like the definition of non-commercial that they’ve put forth because it’s so vague, that hey, if money changes hands, you’re commercial. It basically puts all of the power decisions into my court, which I really like. And I tend to be very liberal with it and I tend to give people here go do it go play with it. One of the things I say to my potential customers is that I’m more interested in creating mutually beneficial relationships than trying to pull as much money out of your wallet as possible.When somebody says how much can we expect to pay for the service, I turn around and say you guys tell me what you think is fair to pay for the service, so tell me what you guys should be paying while you’re on the C-capital, tell me how much you think you should be paying when you get money from a VC or an angel, and then tell me how much money you should be paying on your road to profitability, and when you reach profitability, I’d like to see you pay full list price. And people tend to be very receptive to that line of thinking, so… sorry what was your question?

J: I actually don’t remember now. It doesn’t matter. Oh the original quesiton was about the licenses, if there is anything more you desire from a CC license, but… no. So.

R: No, I really couldn’t. CC has really pulled me in — they recently released a study on the whole non-commercial bit and they went and talked to a lot of people about the non-commercial one and I actually was in a focus group where they studied this and they have a lot of people producing kind of — producing CC art with it, but I was very clearly the odd man out in this entire scenario because they said so, the people who are producing content for you. And I said well the people that are producing content for me do not consider themselves to be producing content for me.

J: Right.

R: And they said What??? So I’m very much the the odd man out in that particular scenario but in the various CC functions where the non-commercial issues have been addressed, there have been a lot of objections raised to it and wishes for improvements to the licenses because there was something—I forget the nuances of this—but there was somebody maybe a broadcaster was trying to use CC licenses and they ended up being in conflict with the commercial terms which were in conflict with their state’s charter regarding state-supported companies. So they ended up using the CC NA in a chilling effects kind of manner and actually ended up killing off a project that was actually really promising. So there are some sharp edges to the non-commercial licenses, which you know, I’m kind of enjoying, but I can actually give people the benefits of that rather than give some kind of chilling effects endeavor.

J: Uhhh….

R: Other than that I’m really very happy with the CC licenses and I wear CC t-shirts a lot and I go to CC conferences — I’m wearing a CC t-shirt as you start asking me questions about CC and I can cogently answer them because I use their stuff so much. And then people are like “Do you have a card?” and I hand them a MB card and they’re like “I thought you were with CC” so it freaks people out. But CC will give me an endless supply of t-shirts as long as I want to wear those shirts. So.

J: And Lessig is on your board, right?

R: He used to be. It was more a common friend, Joi Ito, who was the CEO of CC. He was really rallying support to get my board off to a strong start and I think there might have been some strong-arming behind the scenes in getting Larry to be on my board of directors. Getting all of these high-powered people to actually show up once a year in the same room or coordinated to be in a teleconference is just hell. So I had to dial it back from that and say alright, let’s go pick some people that are a little bit more grounded. But even then, the supposedly grounded people — herding cats into one room is a just a pain in the ass.

J: Yeah. So how did the deal with BBC happen?

R: So it originally started that somebody from the BBC invited me to come speak to them about online communities. They had this series called “Thinking Lunches” where BBC would provide lunch and they’d invite a speaker and the speaker would speak about the stuff that they’re asked to talk about. And actually people from the Audio/Music Group showed up and listened to my presentation and then introduced themselves and said “We’re really interested in working with MusicBrainz but it’s gonna take us a while to really get into this” and so this was back in like 2004 or so, 2005. And they approached me and said “We’re going to get back to you before too long because we’d really like to explore a relationship here.” Which of course made me really happy because the BBC is an awesome organization. I wish we had something like that here in the states, but alas, we don’t really give our organizations enough funding to do cool things like this.

J: A friend of mine said the exact same thing the other day, about a completely different subject, but it’s true. Sad.

R: Once they actually had time to think about this they said alright, we’re a big organization, we’re going to need a little bit more handholding. So they hired me personally for a week and they flew me out to London and said alright, we’re gonna pay you to basically consult with us and tell us everything about MusicBrainz and so forth. And that started, I would say a 3 year phase of me going to see the BBC for at least a couple of weeks out of the year and showing up and talking to them and this that. And the BBC is really interesting because they’re not solving one problem, they’re solving a whole host of problems by using MusicBrainz. I think the most important thing about it that the world doesn’t see is that they’re organizing all of their play-out systems—their systems that actually play the music—they’re organizing all of these systems around MB identifiers. Meaning, even if it were something being played on the BBC Radio Network, they first have to be inside of MB or be on its way into MB, and once it actually gets played out, on the other side, they can spit out ok, this MB ID was played at this time, this was played at that time, this was played at that time, and then you can turn a crank and you can create all of the census reports for all of the agencies that need to be paid. And all of this is being organized based on MB IDs, which is just really huge because I saw MB IDs as one of the most critical things that made it really useful. People at the time just kinda looked at that and said “We don’t get it, yeah, keep talking, whatever.” And then ten years later, very literally last year in March, I get an invitation from the W3C, and they said “We’re having a 20th Anniversary of the beginning of the web celebration at CERN in Switzerland, and you’re invited to go.” And I thought that it was just going to be this massive thing and all my friends were going to be invited, so I started poking around and very few of my friends got invited to this. I thought wait a minute, I need to look at this some more. And my ex-girlfriend, my sugar momma who actually helped me a lot with MB, she said “Wow, this is a really big deal, isn’t it?” I said yeah, it’s looking that way, but I just don’t have the cash to go to Switzerland in a week and a half. She said “Look at the tickets, if it’s reasonable I’ll buy you a ticket.” I looked at tickets and they were like $500, so she bought me a ticket to go to Switzerland and I went to the celebration. I got this really nice tour of CERN, and the large Hadron Collider and the whole nine yards.

J: Amazing!

R: It was really quite amazing. And I met Tim Berners-Lee and I asked him well why did you invite me? I mean thank you for the invitation but his response was basically that there’s lots of people from the history of the web here and I also wanted to have a few people from the future of the web here, and people that are thinking in the right direction are from projects like MB and the OpenStreetMaps. So the OpenStreetMaps guy was there as well because [something about connecting data to thoughts?] and these MB IDs that I thought were really important ten years ago all of a sudden really ARE absolutely critically important and other people are seeing that value. People have connected the MB data set via these IDs to work with other data sets, so just to be given the honor that the father of the web thinks that I’m part of the future of the web… that was a pretty big deal. It was a nice day.

J: Did he comment any more on that? Or was it one of those “You’re part of the future of the web, you belong here.”

R: Ummm he didn’t really elaborate too much but it was mostly me doing the talking and also saying that, look, these open data sets are really important, and even though we’re not making a WHOLE lot of money, our organization is in the black and you should have seen his face. He was like, “What?! You’re in the black? With open data? How the hell did that work?” Right? So then he took my card and he actually started scrawling down notes. There were a lot of people around us and so forth that he seemed like he was just humoring a lot of these people who just wanted to meet him but they weren’t saying anything interesting, and then once he listened to me saying — once I say that “yeah we’re profitable on this,” that’s when people’s demeanor changes drastically and their eyes and ears open up a lot more which is really interesting. So that’s a really interesting way to start a conversation, to say “I’m operating an open source project, I’m getting paid, and we’re in the black.”

J: That is awesome.

R: Yeah, I’m very happy with that, I’m proud that I really thought about these types of things in this type of environment and it’s actually worked out. I thought it was a long shot, a lot of it.

J: So you said that you had contributed to CDDB before Gracenote snagged it. Did you take any inspiration from your experience contributing to that project and how that ran?

R: Oh, absolutely. Absolutely because I mean that was the very first proof in the pudding that something like this could work and that random people scattered across the planet would work in unison. And that was a poor example in a lot of respects, now looking back. But it was proof of concept that really showed that this was possible. So once I looked at them, like, yeah this is possible, there’s my inspiration, and then with time to really think about this and how to do it better. Now I actually look at FreeDB which is actually the continuation of the open version and was compatible to Gracenote and it promptly got sued, because it used to be called FreeCDDB, not FreeDB. If you look at it, it is a gigantic pile of mess. It is just ugly. Because there’s no community — there’s actually no community around the project that’s fixing the data. My community is hellbent on removing duplicates, adding things that are missing, and fixing every little thing and trying to define things in different languages and so forth. They’re really hellbent on all these crazy things as a community, where FreeDB just doesn’t exist, just doesn’t do this. So FreeDB is just loaded with duplicates and loaded with crap. So MB tries to really be a lot different from that, and that actually sometimes works to our disadvantage because Gracenote will gladly tell you that they have 6 million CDs in their system. We don’t have nearly 6 million CDs because we’ve curated all of the duplicates out of the system and we cleaned up our data. So by cleaning up the data and having only really what’s permanent, we fail in apples and oranges kind of comparisons. [45:16] Because we don’t really have that much, we just have all the concise stuff but people’s thinking these days is more is better and that’s really not the case with us. So sometimes it can be a little dicy in selling MB.

J: So I want to talk a little about the future of the project. So are you… I noticed as I’m editing stuff that there’s space for more editorial annotation, little bits of — what is that intended for? As a music — I ask this as somebody who has written about music a lot — what are you trying to do with that, where is that going?

R: The annotations are very loosely structured. They serve a lot of purposes. Here is a catch-all field. If you have some information about an artist and we don’t actually have the structure in the database to handle it yet, you can put that information here. Sometimes you can even add annotation to something “Oh this is the Metallica Black album, it’s not called the Black album, it’s called Metallica.” So some of those things are noted there. It’s not currently being used for full reviews. There’s lots of meaningful point-of-view kind of write ups on Wikipedia and on other sites. I always dabble with the idea of coming up with a good review site that actually reviewed all the CDs in a peer-production kind of fashion, but the biggest problem I’m facing is that there is a thousand ideas to chase on how to make MB better and how to build a sister project, but these thousand ideas all want a thousand engineers behind them.

J: Right.

R: And we’re very resource-limited, so we can’t chase all of these things down, we have to choose very carefully which ones we’re pursuing. So that’s just one that I haven’t really pursued on any kind of level yet but the write ups that Wikipedia has on both releases and on artists end up being very very good, so rather than having those types of write ups in MB, we just link to Wikipedia because Wikipedia is playing to its strength and we’re playing to our strength.

J: And then the other benefit of that is something that I was talking about with a professor: if you want to add something more editorial information about the artist that you happen to have, you can just add it to the Wikipedia entry because that’s open to be edited too.

R: Absolutely, absolutely.

J: I think that’s one of the most interesting and smartest things that I’ve seen in the design of the database itself. I think playing to the strengths of other open source communities is such a great idea.

R: The thing that we do with the relationships and being able to capture links between pieces of data and links to external pieces of data, that is really what sets MB apart in a lot of senses because you really have the power to define new relationship types because there’s a community process for “well if you think you want to do this, you should propose it” and then other people will peer review your proposal and argue it to death, which is sometimes counter productive, but if somebody can pull the consensus together and then a new type of link can be established — it doesn’t really have to involve me as the benevolent dictator at all. One of the things I’ve done a lot is try and empower my community to do what it needs to do for a number of reasons: for self-policing, for just we don’t have much manpower so we have to draw on the community, and also for empowerment — if I can use that buzzword bingo word here for a minute. If you can give power to your community and you don’t have somebody sitting at the top who has to review every little decision, empowering your community is really absolutely essential if you want to grow your community because when you’ve given somebody the power to go make a change, to go make a decision, to go do something, that’s where you grow your community and that’s when they can see their efforts in bright lights and be acknowledged moving forward. That’s a big change for people. So for instance when the BBC rolled out their /music pages, and my community saw their efforts, their words, their thoughts showing up on a BBC website, that made it so much more real. That made it so real for people to go see their efforts and that was a big changing point for making sure that people stuck around because they could see the utility in what it is they’re doing.

J: That’s really great. I was looking at the list of music players that MB works automatically with and they’re all open source. Do you have good relationships with other people in the open source music software development space? I know that you know Rob obviously but is that…

R: Yeah. With the open source community it’s kind of a no-brainer because if you’re an application that uses this type of stuff, has metadata needs, there’s two players out there that will let you use it for free — well, two projects, there’s FreeDB and MB. And really if you do a little analysis between the two it’s very clear which one offers the comprehensive API to better data. Right? So we don’t really have to do much in the way of actually talking to people and so forth to get the open source community to work with us. That’s sort of a default type of situation. The closed-source world is obviously a different kind of ballgame because the closed-source world tends to be people who have not been exposed to open source and probably have listened to a couple horror stories about life within open source and this and that so they just don’t feel comfortable with that and they just feel it’s a trick. You know, why would I get something for free when I know there’s no such thing as a free lunch? So those are the types of hurdles that you need to get over and that tends to be trickier in that sense.

J: Is there anything else you feel is really important for me to know as I get into this? Actually I will tell you a quick story. So I have been contributing a little bit and I started playing with the orphaned discs and stuff like that and I — this goes back to one of my first questions about the passion of music fans and why that might be part of the reason that this works particularly well as a project. I feel like if you ask people to input stock quotes and data about businesses — I just wonder if there’s as much passion from as many people as there is with something like this. But I’m a really big fan of a local band here and I made an edit to one of her releases and it’s her and then there’s a band that she plays with and they tour as her with the band in the full name but the release was released as her but people consider it to be her with the band as the release name and so it’s like the crazy minutiae like that, and I kind of got into a little well not a spat but one of the editors was like “No this was incorrect.”

R: Hahaha

J: It was really funny. And then I was like well this this and this and he was like “well I have the unwrapped CD right here and the sticker’s on it, I can send you a picture” and I’m like “No, that’s cool because I have the unwrapped CD because I actually like listening to it.” You know, I’ve interviewed the artist, is this a pissing contest where I have to prove which one of us is more qualified to comment on this? Let me see who this was…. And then I tried to make a really good case but he just voted it down and I gave up.

R: The problem is when you have fanatic people like that you’re going to have some run-ins and that’s somewhat a downside to it. It’s the same reason a lot of women are turned off to open source, it’s an old boy network where there’s shouting and people that can beat their chest louder can drum out the people that are just trying to get a little bit of work done. So it’s an unfortunate circumstance. A lot of these tricky issues, there’s two reasons for it. One, the music industry has done some really crazy things in how they name things and how they act and play music and so forth is almost as bad as keeping track of my friends and who has been sleeping with who. So keeping track of that is really challenging. And the second thing is that we started out as a CD index, so we were indexing CDs and our focus was to go get the track listing right for a CD so you can listen to it properly. But the change is that we’re really focusing on being a music encyclopedia, it’s really a better way of putting it. The elevator pitch I gave it is that it’s Wikipedia for music. So when you have this switch, you need the database to support what you want to do. So if you want to accurately cover this release really belongs to the band as well as the soloist, they worked on it together, then MusicBrainz really can’t capture that very well. For the last four years we’ve been designing a next-generation schema and for the last 18 months we’ve been coding on it, and if you go to you can see it in action. We’re not finished with it yet, I’m guessing right now release date is looking April/May. And in it we have a concept called an artist credit, where you can give any number of people credit for any track, any release, and you can chain them together and represent them the way artists would want them to be represented on their own. So to go back to this particular problem that you’re facing, your interpretation may be a little bit different from another person’s interpretation and because the database can’t actually represent this, there’s a lot of judgment calls involved in making sure that the data is somewhat useful when it goes into the database and even more useful when we have the next generation schema in place.

J: Right.

R: So, moving on, we anticipate fewer of these types of problems because the database can handle more of these crazy nuances.

J: Right, and his argument, which made sense, is that they performed on the album, so that relationship is sufficient, we don’t actually need to list it as a release under them. And I was like okay but the second release is actually listed as blah blah…. So it’s really interesting how these debates happen over what seem to be just tiny things, you know?

R: I had a raging debate over whether it was graphic artist’s intent or artist’s intent and whether that colon should be there or not. I fought this battle for weeks, I mean we’re talking about a preposterous type of situation.

J: And what you said too, I think it’s with any online community but particularly when you’re actually contributing to something like this where there are debates over style and capitalization and who released what… you definitely have to have a thick skin going in. Not everybody’s going to be nice to you, nobody’s gonna hold your hand, and I think that’s just interesting. If you go in expecting that, then open source is not going to be something you’ll ever want to contribute to again.

R: This is the crux of why women have a hard time in open source because most women don’t like to stick their face head first into some brash conversation and try to prove they’re right. That’s just generally not their style, but that’s very much the character of the Internet so that’s just one of those things about the Internet that I can’t change. I can make my little corner of the world and hopefully make it a little better but that’s just a dichotomy that unfortunately creates a gender imbalance that I’m personally very unhappy about.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: