{"version":"https://jsonfeed.org/version/1","title":"Contributor","home_page_url":"https://www.contributor.fyi","feed_url":"https://www.contributor.fyi/json","description":"The origin story behind the best open source projects and communities.","_fireside":{"subtitle":"The origin story behind the best open source projects and communities.","pubdate":"2024-04-24T02:00:00.000-07:00","explicit":false,"copyright":"2024 by Eric Anderson","owner":"Eric Anderson","image":"https://assets.fireside.fm/file/fireside-images/podcasts/images/6/657ccb75-c55f-4363-8892-f45dd46caf80/cover.jpg?v=1"},"items":[{"id":"00165b1a-eae9-47fa-b226-84d0813e585a","title":"Metadata Management: DataHub with Shirshanka Das","url":"https://www.contributor.fyi/datahub","content_text":"\n\n\nShirshanka Das (@shirshanka) is the CTO of Acryl Data and founder of DataHub, which bills itself as the #1 open-source metadata platform. It enables data discovery, data observability and federated governance to help tame complex data ecosystems. Shirshanka first developed DataHub while at LinkedIn, but has grown it into an independent project with a thriving community.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n How DataHub differs from traditional data catalogs\n\n Themes around why community members get involved and stick with the project\n\n Partnering with Netflix to develop runtime metadata model extensibility\n\n The influence of the pandemic on DataHub’s open-sourcing\n\n Dealing with the future of a project with big community and unlimited scope\n\n\n\nLinks:\n\n\n DataHub\n\n The History of DataHub\n\n","content_html":"

\n

\n\n

Shirshanka Das (@shirshanka) is the CTO of Acryl Data and founder of DataHub, which bills itself as the #1 open-source metadata platform. It enables data discovery, data observability and federated governance to help tame complex data ecosystems. Shirshanka first developed DataHub while at LinkedIn, but has grown it into an independent project with a thriving community.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2024-04-24T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/00165b1a-eae9-47fa-b226-84d0813e585a.mp3","mime_type":"audio/mpeg","size_in_bytes":35537024,"duration_in_seconds":2216}]},{"id":"a24d5e8a-4381-444e-b703-3be8d0edb0cf","title":"Take Your Own Advice: vlcn with Matt Wonlaw","url":"https://www.contributor.fyi/vlcn","content_text":"After his first child was born, Matt Wonlaw (@tantaman) imagined giving his son life advice. What kind of life did he want his kid to lead? At the time, he was working for Facebook, and he decided that his own life needed a change in direction. So Matt started vlcn, aka Vulcan Labs, a research company that develops open-source projects like CR-SQLite and Materialite. vlcn has an unusual business model – Matt receives donations and sponsorships from users and clients. It’s all part of his mission to rethink the modern data stack for writing rich and complex applications.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n One reason that software is still too hard to write: Object orientations\n\n How CR-SQLite allows databases to be merged together and Materialite provides Incremental View Maintenance for JavaScript\n\n Why coding directly to relations can provide a more flexible and efficient approach to building applications\n\n Matt’s decision to build vlcn as a research lab rather than as a startup\n\n Thoughts for the future on PGLite\n\n\n\nLinks:\n\n\n vlcn (Vulcan Labs)\n\n CR-SQLite\n\n Materialite\n\n fly.io\n\n PGLite\n\n\n\nPeople mentioned:\n\n\n Johannes Schickling (@schickling)\n\n\n\nMatt Wonlaw:\nI've always been looking for simpler ways to do things, I think like a lot of engineers are trying to find simpler ways to do things, and yeah, trying out different programming models and whatnot. I guess there's a quote, \"I'd rather help people write programs than write programs.\"\n\nEric Anderson:\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson.\n\nToday, we get to talk with Matt Wonlaw, who's the creator and principal at Vulcan Labs. Welcome, Matt.\n\nMatt Wonlaw:\nThanks. Nice to be here.\n\nEric Anderson:\nI don't know if you've ever called yourself the principal of Vulcan Labs. I just made that up.\n\nMatt Wonlaw:\nYeah, no. I guess, founder, but it's a one-man show, so titles don't mean much.\n\nEric Anderson:\nWe're going to talk about all the things Vulcan Labs does. One of those things is an open source project, CR-SQLite. We've talked a lot about SQLite on the show lately, so excited to have you. But I want to start, actually, maybe with Vulcan Labs 'cause I think that's the right umbrella for this conversation. You're living the dream in many respects. Tell us about what Vulcan Labs is and how you got into it.\n\nMatt Wonlaw:\nYeah, I think software is too hard to write. The first 80% starts pretty easy, as the saying goes, it's that last 20% and for some reason, it takes forever and everything turns into that spaghetti code. And I was trying to figure out what is it that makes software so hard to write? Is it just like engineers are bad and we need to study harder or is it the tools we're using are the wrong abstractions for the job? They're just not up to today's applications. And I came to the conclusion that the tooling is wrong, and I guess Vulcan Labs is my experiments and explorations in what should the modern dev stack for rich applications like a Figma or a Notion or Spotify, if you were to build those, what primitives do you need to build those? I guess, as a preview, the solutions I've landed on are CR-SQLite. It's basically an extension for SQLite that lets you merge databases. So think of Git for databases in a way, so that's to facilitate collaboration and multiplayer.\n\nThe other is Materialite. So this is bringing incremental view maintenance to embedded databases 'cause a lot of queries, you want to be able to subscribe to a query rather than always being request response 'cause then like a UI, you want to know instantaneously what has changed rather than every time someone makes a change, recoding the database. And then the last two things, one is tree SQL. So I think the relational model is the right model, but if you're using a relational model, how do you pull a tree of data out 'cause a lot of times your application, it needs a hierarchy, not some flat set of relations. So if you think like a issue tracker, you might have an issue and that issue has labels and maybe has comments, you might want to pull those as one document. And yeah, the last thing is better integration between the host language and the database. So you have full static typing end-to-end.\n\nEric Anderson:\nThe high level idea of, \"Software is still too hard to write,\" is I think a good first principle. I think we can all agree. Is there some fundamental reason for that or is this just a blocking and tackling exercise? We just need to make it easier in a thousand different ways.\n\nMatt Wonlaw:\nObject orientation is a big culprit I think. So I think when we model our data with OO, we're bringing some implicit assumptions to the data that we don't realize. So one example is a chair, right? You're going to model a chair in your program. You're going to say your chair is a piece of furniture. Well, that classification of chair as furniture is very dependent on the perspective of the viewer or the person using the chair. Yeah, in my normal life, a chair is a piece of furniture, but if it's really cold outside and I don't have heat, maybe my chair is firewood or if I need to break out of the office, maybe my chair's a battering ram. If I need to block a door, maybe the chair's a barricade. It has all these different roles depending on the context of the viewer. And I think software, as it gets older, gets new requirements that change those fundamental perspectives and your data model, when it's this OO way, it is not flexible enough to adapt to that.\n\nSo I think OO is one culprit. Can we code directly to relations? And that got me started down the relational DB path. And then I realized there's other problems here like relational databases are built for this era of request-response. Like you think about the early LAMP stack kind of world where every time a user visited a site, you would request from the database, get all data and show stuff. Whereas if you're building some rich UI like a Spotify, you don't want to have to request to the database every time you make a state change. Every time a state change happens, it should just update, kind of like call you back reactively. So that's kind of where the Materialite project was born of SQLite. And these databases are still in this request-response paradigm. They really need a subscription paradigm where I can say, \"Set up all my queries that my app needs and subscribe to them.\" 'Cause yeah, applications have very static set of queries.\n\nIf you think about WhatsApp, it has a query for your threads, your messages, your contact list and those queries don't change every time you open the app. It's really the data that's changing the queries that are static. So for these sort of apps, you need a database that really handles the static query case and dynamically changing data case.\n\nEric Anderson:\nSo you have this broader realization that software could be easier in a lot of ways, and if you start identifying these culprits, you're mapping them into projects, was this all happening before you started Vulcan or did you lead to start Vulcan and you're like, \"Now what do I do,\" and happen into this? How long has this been a motivator for you?\n\nMatt Wonlaw:\nEver since I started coding, I've always been looking for simpler ways to do things. I think like a lot of engineers who are trying to find simpler ways to do things and yeah, trying out different programming models and whatnot. I guess there's a quote, \"I'd rather help people write programs than write programs,\" if that makes sense. I'd rather write infrastructure for people. I was doing a lot of that at Facebook and whatnot, but I just wanted to do something that had broader applicability to the whole world and could be open sourced. So after I had my first child, I was thinking in my head, \"What am I going to tell him how to live his life?\" I was like, \"Oh, I'm not living this life,\" I would tell him. Yeah, I took the plunge and decided to leave my job. But I've always been sort of like keep the fire movement of aggressively saving and stuff being a thing for eight years. I had enough to just take that plunge. I was fortunate enough to be able to do that and work on this stuff full-time.\n\nEric Anderson:\nPeople tell themselves, \"Oh, if I had more financial security, I would do X or Y. That's my true passion.\" And I challenge them a bit like, \"well, Is this true passion of yours really so commercially in-viable?\" 'Cause presumably, that passion probably produces some value for society. And I think most people when they realize that, they're like, \"Yeah, actually, if I did do this thing I love, it wouldn't be so terrible.\" And it sounds like you came to that conclusion eventually.\n\nMatt Wonlaw:\nYeah, yeah. That was part of the motivator is like, \"If I quit, I'm doing something. It's going to produce money at some point. It's not like I'm going to be living on my savings forever.\" And I think, yeah, after a year, a year and a half in, some people found CR-SQlite Fly.io specifically and they did a pretty significant sponsorship. And then some people found Materialite and were like, \"Oh, we need this in our database and yes, contract to hire.\" So yeah, money is coming in and it's more than I expected. So yeah, it's been pretty good.\n\nEric Anderson:\nI'm curious now that you've been through this, Matt, what do you think about the future? I'm remembering one of your tweets where you kind of thought, \"Oh, a lot of people are chasing this problem area now. Maybe it's time for me to move on to something else.\" I get the impression that you see yourself as kind of operating at the frontier.\n\nMatt Wonlaw:\nYeah. So what I spent most of my time on was CR-SQlite, which is this concept of can you model SQLite tables with CRDTs so that you can merge databases? The idea would be like this would make collaborative software or software can go offline and come back online. Really easy for developers. They don't have to understand CRDTs themselves, they don't have to understand syncing. They just set up their table say, this column's last-write-wins, this column is some tech CRDT and they just write to their database and the database handles all the merging for them.\n\nAnd when I started that project, I think there was just Automerge and Yjs, those were two other projects in the space of CRDTs and collaboration and data syncing. But then I guess a year later, we saw ElectricSQL, PowerSync, SQLSync, Replicache. So there was endless entrance into this data syncing space. And yeah, I guess I've never really questioned myself like, \"Why do I feel like once it's saturated, I need to move on?\" I guess I feel like other people are solving it. There's more interesting stuff to solve next.\n\nSo yeah, after I saw so many people working on that problem, I knew in my own apps and apps people have been writing with CR-SQLite, there was this question of I'm using SQLite as my data store, but I have to build this object model on top because I want to be able to react in realtime to mutations. So I was like, \"Okay, this seems like a problem where we could have incremental views and subscriptions, where you could subscribe to your queries rather than having to create this separate layer to handle notifications of writes.\" So that's kind of where I got into incremental view maintenance and Materialite.\n\nEric Anderson:\nYou and I have talked a little bit about SQLite and then on the show, we've covered quite a bit. I want to make sure we cover as much as you're interested in covering SQLite 'cause I think it's been an interesting ecosystem for the last little bit. What have you found interesting or what are some topics we should cover today?\n\nMatt Wonlaw:\nWell, I guess my latest conclusion is SQLite might not be the right fit for today's like Figmas or Notions like the next set of rich UIs. And why I think that is 'cause, as I said earlier, if you think of an application, the queries are pretty static and the data is what's dynamic. So we really want a facility for doing subscriptions against the database, like for the WhatsApp example. I have a query for my messages, a query for my threads, a query for user status. I just want to subscribe to all those and anytime that data changes, notify me so I can update the UI. With SQLite right now, you can't subscribe to anything, right? You've got to either pull on some interval or you've got to add some layer on top of the database to handle this for you. And once you add a layer on top of the database, you end up with all the same problems that DB was solving for you, right?\n\nA database gives you nice transactions and concurrency control, but if you're going to add a new layer on top, well, you're going to have to think about transactions, concurrency control, and implement all that there. So it would be better if it is in the DB itself. And then, I guess, I briefly touched on the object model is the wrong one. It'd be great if you could just code directly to relations. I think that might be a little wild idea to some people, but if you're coding directly to relations, you need your database to be as fast as memory. So when you write, it finishes as fast as setting a variable. When you read, it's as fast as reading a variable off the JavaScript heap if you're using JS.\n\nAnd SQLite where every transaction when you commit it has to be durably committed before you can do another operation, I think that's a fundamental problem for if you want to just code directly to relations, where I don't think all commits need to be durably committed before you move on to the next commit 'cause there's plenty of cases in an application where it's fine if you lose the last few commits, if you think of Spotify or something, right? Not every single interaction needs to be durable immediately, and if you can get rid of that durability for every single commit, you can go a lot faster.\n\nEric Anderson:\nMaybe I shouldn't ask this question. What does this coding directly to relations mean? Is this like a graph-like idea or to map it with terms that I've explored before?\n\nMatt Wonlaw:\nYeah, so I guess if you think of Spotify, you have your tracks, you have your Play button, you have your timeline of what's currently where you are in the track. Somehow you have to get the data to those components, essentially the Play state. So when you press Play, all the things can update and show the correct state. And if you're using an object model, maybe you're passing that down to all the components. But if you're coding directly against some relational DB where everything's flat and laid out, like this global data store where every component can just issue a query for the data it needs, so literally just issuing some query for the data it needs and then subscribing to that query. And I guess coding to relations has less of this perspective problem of that chair example of if I need to find something that's firewood, well, I can just query for anything that's made of wood and chairs will come back and that's fine.\n\nEric Anderson:\nMaybe it's worth pointing out. One of the things that's interesting about what you're doing, I think it's kind of orthogonal to the way a lot of the software is developed today, which is individuals have a specific problem in a company, they build some software to solve that specific problem and then they explore whether it can be generalized to a lot of people. And I think you've separated yourself from a specific application and said, \"Here's a general problem. Can I come up with a general solution and then explore whether it can apply to certain people's specific problems?\" Is that fair to say you're taking the reverse approach to say Uber open source or Google open source?\n\nMatt Wonlaw:\nI started... So I guess I'm pulling on my, been doing software development for 15 years and I've noticed these recurring patterns and all sorts of systems we've had to build. So I guess it's pulling on this history of experience in these problems I know I keep dealing with and thinking through why does state management keep coming up as a problem? Why is model view controller always turning into a mess eventually? And I do have this old project I used to use, it's like to validate and test my ideas, Strut.io, it's like this old presentation editor. Yeah, there's some grounding in reality and use case.\n\nEric Anderson:\nOkay, so you went down the SQLite journey, built CR-SQLite, Fly got excited, but over time you came to the conclusion that maybe SQLite isn't the right abstraction. Have you come to similar conclusions for your other projects?\n\nMatt Wonlaw:\nSo I guess the short answer is not really. I think the other projects still seem like they're bearing a lot of fruit. SQLite, I had reservations at the beginning, but the old adage, \"Don't write your own database,\" and all these things, coming from Facebook where they're very pragmatic about how they do things. I was like, \"Yeah, that culture was instilled in me, so let me make the pragmatic choice here. Let me pull SQLite off the shelf and see how far I can get with it.\" And then just after a while it was like, \"Yeah, I don't think it's going to get all the way to where I want to get it to.\" And I think part of that is I've been targeting the web platform specifically and I just didn't realize how much of a tax there is crossing from JavaScript to WASM and back and there's a huge performance hit just in that. And maybe if that performance hit wasn't there, I could still continue with SQLite, I'm not sure.\n\nEric Anderson:\nDo you come out the other door, if SQLite isn't the answer, is there something else you lean towards?\n\nMatt Wonlaw:\nWhat I'm leaning towards is just fully investing my time in Materialite and building that up to just not just be incremental view maintenance, but also a regular database under the hood. Yeah, I am leaning into, yeah, I guess, I will build my own database. The upside of that is Materialite actually has the most interest from people financially, so it'd make sense for me to do that too.\n\nEric Anderson:\nI've noticed a little database trend of late where folks seem to not build databases from scratch in the form factor we're used to as much anymore as they borrow this from Postgres or that from ClickHouse. And that database ends up being a federation of database components then like a single monolith. And I'm curious as you approach Materialite, is that going to bear out or...\n\nMatt Wonlaw:\nYeah, I think so. So that's also one of the things that finally pushed me over the edge to maybe I can build my own. I don't know. I feel so arrogant saying this, maybe I can build my own database, but I guess that's why we do hard things, right? We do 'cause we don't know they're hard. But there's a guy at Triplet, I think their founder, he's like, \"Yeah, building database probably isn't that hard these days 'cause we have all these key-value stores in B-tree implementations, you can just pull stuff,\" as you said, \"off the shelf.\" So yeah, that's what I've been doing and the browser has IndexedDB, which is pretty decent abstraction for a key-value store to base the database off of, so I don't have to solve the writing to disk and the durability and all these things. I just really am dealing with that key-value interface. Everything below that is already solved for me.\n\nEric Anderson:\nIf Materialite goes the way you want, is this a database in the sense of a local database like a SQLite or is this a distributed database, I don't need Postgres anymore?\n\nMatt Wonlaw:\nIt'll be a locally embedded database like SQLite, but ideally with collaborative features, right? I think SQLite was built at a time where everybody just had one computer, but now you have multiple devices and you might start work on your phone and continue it on your laptop and then finish it later back on your phone and you want that data obviously to sync between all of them. So I think, yeah, it'd be really cool if an embedded database of the future knew how to sync or merge itself with other copies of itself, so you do-\n\nEric Anderson:\nGot it.\n\nMatt Wonlaw:\n... 'Cause I guess going back to Strut.io, when I first quit, I was like, \"Oh, I think I can resurrect this old project and make a bunch of money real quick with it.\" But that project didn't have multiplayer or syncing, so I was like, \"Okay, let me just go add that real quick.\" And I was like, \"Oh, this is a mess and that's why I think there's a market here with CR-SQLite.\" And that's how I started building CR-SQLite, \"I think there's a market, let me just go build that general solution.\" To all that say, syncing is hard, it would be great if something like at the database layer could handle that for developers, so they don't have to be experts in distributed systems.\n\nEric Anderson:\nSo I guess this transition, if I'm imagining that is what's happening between your focus on CR-SQLite to Materialite reinforces the fact that Vulcan Labs is the thing that's persistent and continuous. That's where you're being consistent and then you move the locus of attention from one project to the other as needs arise, as conclusions form. Is that fair?\n\nMatt Wonlaw:\nYeah, I guess I started, it was really just Vulcan and it was all going to be a product around CR-SQLite and then I realized I really enjoy solving the difficult problems and not so much the finding collaborators, maybe finding funding, maybe hiring people, the whole business side building. So I was like, \"Okay, pivoting it as a research lab and doing contract work for people who also have these problems.\" It fits my interest and two, it provides some financial stability.\n\nEric Anderson:\nMaybe one question going back to this object-oriented as the big culprit and you mentioned relations to the model, and I think I got that with respect to data modeling and your database. Does that also mean you would avoid objects in more just general programming, like classes are problematic? What's the alternative? Is it functional or is there some other approach?\n\nMatt Wonlaw:\nYeah, I don't think classes are altogether bad, right? I think you still need classes and objects and all these things for...\n\nEric Anderson:\nAbstraction and encapsulation maybe.\n\nMatt Wonlaw:\nYeah. But I'm trying to think of the best way to, there is a separation somewhere, there's like the data for the problem domain you're modeling, and I feel like that should be relational. And then there's the data for your algorithms, maybe your B-tree or your breadth first traversal or whatever these things are. These can be classes and hide their data and have interfaces and whatnot. But the stuff that's part of the problem domain feels like it should be stored in a relational DB and accessed directly through queries.\n\nI guess one other thing that the object model or program memory is missing is good primitives for doing mutations. So if you think about a database, you can start a transaction, you can mutate five things. If you fail in the middle, well, they all get rolled back. They think about your program memory. If you start a bunch of mutations and then you have an exception in the middle, you don't really have a facility to roll back. A few languages have software transactional memory, but not many. And I think that's another big problem as to like... Like people always say, \"Oh, global state is a horrible thing,\" but our databases are shared global mutable state, and we generally don't have that many problems with them. And I think the primitives in our languages are just, we don't have the right mutation primitives, we can't do a transaction, we can't get isolation of transactions, we can't roll things back.\n\nYeah, I should be able to say, \"I want to mutate these five variables, but nobody should be able to see the effects of the mutation until I'm done. And if I fail in the middle, I should be able to revert to the prior state.\" But yeah, we don't have that in JavaScript or C or whatever.\n\nEric Anderson:\nInteresting, and maybe we don't have those things because the database has done that for us in the past, like we've just said the database is the record of truth for this data and we're going to outsource mutation management to the database. And now as we want to do more of that state management within our programming more directly, do you envision that emerging? Is this the fifth project for Vulcan lab?\n\nMatt Wonlaw:\nSo I think in the early 2000s, you're right, when pages were, like the LAMP stack and everything was a full page refresh, all your state was in the database and that was the system of record and you had no in-memory state. You had to manage and roll back and do all these things with. I think things were pretty easy back then. And I guess to answer your other question, would I ever do software transactional memory? So I know Clojure has it, they've had it for a really long time. JavaScript, the environment seems so impure, it seems like super hard to pull that off. I don't know if you could pull that off in user space.\n\nEric Anderson:\nSo Matt, maybe we can just touch a bit more on the Vulcan Labs model 'cause I think it's an interesting model. So you went into this thinking, \"I'm just going to create value, solve big problems, and if people want to fund me, great and if not, we'll figure it out.\" And people did want to fund you. How does that happen? What are the mechanics 'cause I think that's interesting to some people? They just send you dollar bills in the mail?\n\nMatt Wonlaw:\nYeah, I wish it was that easy. Like okay, some people send you dollar bills in the mail, but they're literally like $5 a month.\n\nEric Anderson:\nRight.\n\nMatt Wonlaw:\nTo get worthwhile sponsorships, yeah, I was just actually talking with the cut, so finding out which companies were using this stuff. Usually they were talking to me in Discord 'cause they needed support and then finding out who are they, okay, they're part of a company and then just literally having that conversation like, \"It looks like this powers one of your products. I'm doing a lot of bug fixes and maintaining things for you. I think it would be fair if you guys contribute a significant amount to this.\" Yeah, one, so Johannes, he's a guy I talked to. He encouraged me to say, \"Oh, are you getting one engineer's salary worth of value out of this? That's how much you should contribute.\" I never went quite that aggressive. So I went somewhere between $0 to that aggressive, yeah just having those conversations, being upfront that they're getting value and what's the worst that can happen? They can say no, but they're probably still going to use your project, they're not just going to drop it.\n\nI did wait, right? I didn't ask for these donations immediately. I waited till the project was kind of in use in some capacity at their company before asking, which it feels a bit sleazy like, \"Oh, I waited until you deployed to production, now I'm asking you for money.\" But I think everybody understands that you got to make a living somehow.\n\nEric Anderson:\nYeah, I don't think it's terrible to deliver value and then ask for money. It'd be almost, the reverse seems also equally strange to be like, \"I think you should pay me and just hope that I give you something interesting in the end.\"\n\nMatt Wonlaw:\nThat's true.\n\nEric Anderson:\nAnd these people are accommodating. And are there strings attached to that money? Do you implicitly need to serve their needs or do they ask for deliverables?\n\nMatt Wonlaw:\nYeah, it's different for different companies. Some are no-strings attached. It's like, \"We're using it, we know how to use it. Whatever support you can provide is fine, but we're not holding you to anything.\" There have been others that like, \"Oh, we want to give you equity and make sure you're aligned with our company's success and sign this contract.\" Those, I just have to turn down because-\n\nEric Anderson:\nOkay, yeah.\n\nMatt Wonlaw:\n... I feel like if I'm going to start tying myself to one specific sponsor, then that might preclude others in the future. And now I'm suddenly worried about what I say publicly or work on in the future, right? For those, I just say, \"Yeah, I can't do that. You can just write a sponsorship check or nothing at all.\" I don't know if you've talked to others in the past who get open source sponsorships and if you have any insights into what other people have done?\n\nEric Anderson:\nGitHub now facilitates some donations and I think Patreon is another kind of channel that I've seen people use. In the case of maybe both of those, there's an option to do a reacquiring, like, \"I'm going to subscribe to a donation. I am going to give you 10 bucks a month,\" which is an interesting model. We've all seen the phrasing around Buy Me A Coffee, which is kind of a fun, it seems like if Matt was in person, I definitely would buy him a coffee. I imagine in most cases your sponsors are buying you much more than that, but do you incorporate as a nonprofit, are you like Mozilla Foundation in a small way?\n\nMatt Wonlaw:\nYeah, these are very good questions. I should talk to somebody who deals with this to give you some advice, but right now, I'm just set up as an LLC. But yeah, I don't know, could I set up as a nonprofit, get better tax benefits or... Yeah, I know there's research tax credits and stuff.\n\nEric Anderson:\nGood. So Matt, I want to pause here 'cause we're approaching end of time. Anything you wanted to cover or that would be good to cover?\n\nMatt Wonlaw:\nI think the last thing that's interesting to me is you have an embedded database, which has some set of data on it, but realistically, you probably have... I guess, people have all sorts of different devices with different storage. You have your phone and then your laptop and desktop. So the amount of data in your app on your phone is probably going to be a sliver of what might be on your desktop. So it is really interesting if like can you create an embedded replicated database where on the phone it just has the hot set that the phone is using, but on the desktop it has the full set and can you do queries that span both databases where like, \"Oh, it'll query my data locally and then if there's more data, it'll go back to the server or whatever and pull in the rest.\"\n\nEric Anderson:\nMaybe jamming on that a little bit more broadly. I don't think this is anything new, but I novel all the time when I look through my apps on my phone that I have gigabytes of app on my phone, but very little of my user data. People build apps, they ship them to my phone and then on my phone, I generate data and I ship it to their server and they store my data and I store their app, and I find that a little odd. I feel like I should keep my data, you keep your app. Somehow that's not how mobile is built today. You're describing something that's more nuanced, which is how has this user data persisted and some of it's stored on my client, on desktop we're okay with lots of it being stored and on mobile, you want a little more flexibility to only store what's needed in a critical sense.\n\nMatt Wonlaw:\nYeah, so on mobile, only store what's needed, but how can we hide that complexity from the developer where they're not having to figure out what to store or not. Maybe as they query the database, it sees what you're querying and it pulls in what you need locally and anything you didn't query, stays on the server or desktop, wherever your user data is. 'Cause I guess the Local-First Movement, there's this whole movement called Local-First where it's like users should have all their data on device and all their user data there, but all the solutions in the Local-First space require you syncing all of your data to the app before you can start using it.\n\nAnd some people in that space say, \"Oh, you're never going to generate enough user data to fill your phone. Phones are huge now.\" Maybe that's true, maybe I'm barking up the wrong tree, but I feel like you do need the ability to start your app with just the slice of user data you need at that moment and as you need it, start syncing it in from, I don't know, your desktop or maybe you have some service provider that you do let you store your aggregate of user data.\n\nEric Anderson:\nI feel like Local-First is often useful for single player applications and I just wonder are they just as useful for these multiplayer applications? In this sense, you bring the whole enterprise's data onto your device?\n\nMatt Wonlaw:\nYeah, so that's where these Local-First solutions have sort of been breaking it down, right? They're good for single player, they're good for multiplayer up to an extent. But then once you get, I don't know, all of Boeing or Lockheed Martin or something, that's a whole enterprise, you're not going to sink down the whole enterprise Wiki and enterprise task tracker and whatever else there is. So yeah, I think in that case where you do have enterprise software where you want to be snappy and fast, where all the data that your employees are using are local on your device, you do need to somehow figure out how to just sync a subset of the enterprise, just what that user commonly uses day-to-day is synced to their device and local for them.\n\nEric Anderson:\nMaybe I am asking you for a hot-take here. If you're spending less time in SQLite at the moment, have you come across PGlite, this kind of like running Postgres on WASM on the client? Is that a thing that we should be keeping an eye on?\n\nMatt Wonlaw:\nI've seen it. It looks really cool. I was super blown away. They got the WASM size down like 2.6 megs after compression. I'm like, \"Postgres, is that small?\" I had no idea. I don't know if they've gripped stuff out. Yeah, I'm super curious. I want to see some benchmarks. Yeah, it's the ElectricSQL, folks. I think Neon working on it, so I know them pretty well. But also I have enough going on, I don't want to distract myself with a new rabbit hole, so I think I'll probably just get a TLDR and try to just file that away for some future problem if it ever comes up.\n\nEric Anderson:\nWell, Matt, I think we're also maybe in kind of conclusion here. Glad not only is your son the beneficiary of you going rogue here with Vulcan Labs, but I think all the world, rather than there being more code in Facebook, we have more code in public and you're solving generalized problems that all programmers face. So thanks for your contributions. I'm excited to see where it takes you.\n\nMatt Wonlaw:\nYeah, thanks for having me.\n\nEric Anderson:\nYou can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.","content_html":"

After his first child was born, Matt Wonlaw (@tantaman) imagined giving his son life advice. What kind of life did he want his kid to lead? At the time, he was working for Facebook, and he decided that his own life needed a change in direction. So Matt started vlcn, aka Vulcan Labs, a research company that develops open-source projects like CR-SQLite and Materialite. vlcn has an unusual business model – Matt receives donations and sponsorships from users and clients. It’s all part of his mission to rethink the modern data stack for writing rich and complex applications.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Matt Wonlaw:
\nI've always been looking for simpler ways to do things, I think like a lot of engineers are trying to find simpler ways to do things, and yeah, trying out different programming models and whatnot. I guess there's a quote, "I'd rather help people write programs than write programs."

\n\n

Eric Anderson:
\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson.

\n\n

Today, we get to talk with Matt Wonlaw, who's the creator and principal at Vulcan Labs. Welcome, Matt.

\n\n

Matt Wonlaw:
\nThanks. Nice to be here.

\n\n

Eric Anderson:
\nI don't know if you've ever called yourself the principal of Vulcan Labs. I just made that up.

\n\n

Matt Wonlaw:
\nYeah, no. I guess, founder, but it's a one-man show, so titles don't mean much.

\n\n

Eric Anderson:
\nWe're going to talk about all the things Vulcan Labs does. One of those things is an open source project, CR-SQLite. We've talked a lot about SQLite on the show lately, so excited to have you. But I want to start, actually, maybe with Vulcan Labs 'cause I think that's the right umbrella for this conversation. You're living the dream in many respects. Tell us about what Vulcan Labs is and how you got into it.

\n\n

Matt Wonlaw:
\nYeah, I think software is too hard to write. The first 80% starts pretty easy, as the saying goes, it's that last 20% and for some reason, it takes forever and everything turns into that spaghetti code. And I was trying to figure out what is it that makes software so hard to write? Is it just like engineers are bad and we need to study harder or is it the tools we're using are the wrong abstractions for the job? They're just not up to today's applications. And I came to the conclusion that the tooling is wrong, and I guess Vulcan Labs is my experiments and explorations in what should the modern dev stack for rich applications like a Figma or a Notion or Spotify, if you were to build those, what primitives do you need to build those? I guess, as a preview, the solutions I've landed on are CR-SQLite. It's basically an extension for SQLite that lets you merge databases. So think of Git for databases in a way, so that's to facilitate collaboration and multiplayer.

\n\n

The other is Materialite. So this is bringing incremental view maintenance to embedded databases 'cause a lot of queries, you want to be able to subscribe to a query rather than always being request response 'cause then like a UI, you want to know instantaneously what has changed rather than every time someone makes a change, recoding the database. And then the last two things, one is tree SQL. So I think the relational model is the right model, but if you're using a relational model, how do you pull a tree of data out 'cause a lot of times your application, it needs a hierarchy, not some flat set of relations. So if you think like a issue tracker, you might have an issue and that issue has labels and maybe has comments, you might want to pull those as one document. And yeah, the last thing is better integration between the host language and the database. So you have full static typing end-to-end.

\n\n

Eric Anderson:
\nThe high level idea of, "Software is still too hard to write," is I think a good first principle. I think we can all agree. Is there some fundamental reason for that or is this just a blocking and tackling exercise? We just need to make it easier in a thousand different ways.

\n\n

Matt Wonlaw:
\nObject orientation is a big culprit I think. So I think when we model our data with OO, we're bringing some implicit assumptions to the data that we don't realize. So one example is a chair, right? You're going to model a chair in your program. You're going to say your chair is a piece of furniture. Well, that classification of chair as furniture is very dependent on the perspective of the viewer or the person using the chair. Yeah, in my normal life, a chair is a piece of furniture, but if it's really cold outside and I don't have heat, maybe my chair is firewood or if I need to break out of the office, maybe my chair's a battering ram. If I need to block a door, maybe the chair's a barricade. It has all these different roles depending on the context of the viewer. And I think software, as it gets older, gets new requirements that change those fundamental perspectives and your data model, when it's this OO way, it is not flexible enough to adapt to that.

\n\n

So I think OO is one culprit. Can we code directly to relations? And that got me started down the relational DB path. And then I realized there's other problems here like relational databases are built for this era of request-response. Like you think about the early LAMP stack kind of world where every time a user visited a site, you would request from the database, get all data and show stuff. Whereas if you're building some rich UI like a Spotify, you don't want to have to request to the database every time you make a state change. Every time a state change happens, it should just update, kind of like call you back reactively. So that's kind of where the Materialite project was born of SQLite. And these databases are still in this request-response paradigm. They really need a subscription paradigm where I can say, "Set up all my queries that my app needs and subscribe to them." 'Cause yeah, applications have very static set of queries.

\n\n

If you think about WhatsApp, it has a query for your threads, your messages, your contact list and those queries don't change every time you open the app. It's really the data that's changing the queries that are static. So for these sort of apps, you need a database that really handles the static query case and dynamically changing data case.

\n\n

Eric Anderson:
\nSo you have this broader realization that software could be easier in a lot of ways, and if you start identifying these culprits, you're mapping them into projects, was this all happening before you started Vulcan or did you lead to start Vulcan and you're like, "Now what do I do," and happen into this? How long has this been a motivator for you?

\n\n

Matt Wonlaw:
\nEver since I started coding, I've always been looking for simpler ways to do things. I think like a lot of engineers who are trying to find simpler ways to do things and yeah, trying out different programming models and whatnot. I guess there's a quote, "I'd rather help people write programs than write programs," if that makes sense. I'd rather write infrastructure for people. I was doing a lot of that at Facebook and whatnot, but I just wanted to do something that had broader applicability to the whole world and could be open sourced. So after I had my first child, I was thinking in my head, "What am I going to tell him how to live his life?" I was like, "Oh, I'm not living this life," I would tell him. Yeah, I took the plunge and decided to leave my job. But I've always been sort of like keep the fire movement of aggressively saving and stuff being a thing for eight years. I had enough to just take that plunge. I was fortunate enough to be able to do that and work on this stuff full-time.

\n\n

Eric Anderson:
\nPeople tell themselves, "Oh, if I had more financial security, I would do X or Y. That's my true passion." And I challenge them a bit like, "well, Is this true passion of yours really so commercially in-viable?" 'Cause presumably, that passion probably produces some value for society. And I think most people when they realize that, they're like, "Yeah, actually, if I did do this thing I love, it wouldn't be so terrible." And it sounds like you came to that conclusion eventually.

\n\n

Matt Wonlaw:
\nYeah, yeah. That was part of the motivator is like, "If I quit, I'm doing something. It's going to produce money at some point. It's not like I'm going to be living on my savings forever." And I think, yeah, after a year, a year and a half in, some people found CR-SQlite Fly.io specifically and they did a pretty significant sponsorship. And then some people found Materialite and were like, "Oh, we need this in our database and yes, contract to hire." So yeah, money is coming in and it's more than I expected. So yeah, it's been pretty good.

\n\n

Eric Anderson:
\nI'm curious now that you've been through this, Matt, what do you think about the future? I'm remembering one of your tweets where you kind of thought, "Oh, a lot of people are chasing this problem area now. Maybe it's time for me to move on to something else." I get the impression that you see yourself as kind of operating at the frontier.

\n\n

Matt Wonlaw:
\nYeah. So what I spent most of my time on was CR-SQlite, which is this concept of can you model SQLite tables with CRDTs so that you can merge databases? The idea would be like this would make collaborative software or software can go offline and come back online. Really easy for developers. They don't have to understand CRDTs themselves, they don't have to understand syncing. They just set up their table say, this column's last-write-wins, this column is some tech CRDT and they just write to their database and the database handles all the merging for them.

\n\n

And when I started that project, I think there was just Automerge and Yjs, those were two other projects in the space of CRDTs and collaboration and data syncing. But then I guess a year later, we saw ElectricSQL, PowerSync, SQLSync, Replicache. So there was endless entrance into this data syncing space. And yeah, I guess I've never really questioned myself like, "Why do I feel like once it's saturated, I need to move on?" I guess I feel like other people are solving it. There's more interesting stuff to solve next.

\n\n

So yeah, after I saw so many people working on that problem, I knew in my own apps and apps people have been writing with CR-SQLite, there was this question of I'm using SQLite as my data store, but I have to build this object model on top because I want to be able to react in realtime to mutations. So I was like, "Okay, this seems like a problem where we could have incremental views and subscriptions, where you could subscribe to your queries rather than having to create this separate layer to handle notifications of writes." So that's kind of where I got into incremental view maintenance and Materialite.

\n\n

Eric Anderson:
\nYou and I have talked a little bit about SQLite and then on the show, we've covered quite a bit. I want to make sure we cover as much as you're interested in covering SQLite 'cause I think it's been an interesting ecosystem for the last little bit. What have you found interesting or what are some topics we should cover today?

\n\n

Matt Wonlaw:
\nWell, I guess my latest conclusion is SQLite might not be the right fit for today's like Figmas or Notions like the next set of rich UIs. And why I think that is 'cause, as I said earlier, if you think of an application, the queries are pretty static and the data is what's dynamic. So we really want a facility for doing subscriptions against the database, like for the WhatsApp example. I have a query for my messages, a query for my threads, a query for user status. I just want to subscribe to all those and anytime that data changes, notify me so I can update the UI. With SQLite right now, you can't subscribe to anything, right? You've got to either pull on some interval or you've got to add some layer on top of the database to handle this for you. And once you add a layer on top of the database, you end up with all the same problems that DB was solving for you, right?

\n\n

A database gives you nice transactions and concurrency control, but if you're going to add a new layer on top, well, you're going to have to think about transactions, concurrency control, and implement all that there. So it would be better if it is in the DB itself. And then, I guess, I briefly touched on the object model is the wrong one. It'd be great if you could just code directly to relations. I think that might be a little wild idea to some people, but if you're coding directly to relations, you need your database to be as fast as memory. So when you write, it finishes as fast as setting a variable. When you read, it's as fast as reading a variable off the JavaScript heap if you're using JS.

\n\n

And SQLite where every transaction when you commit it has to be durably committed before you can do another operation, I think that's a fundamental problem for if you want to just code directly to relations, where I don't think all commits need to be durably committed before you move on to the next commit 'cause there's plenty of cases in an application where it's fine if you lose the last few commits, if you think of Spotify or something, right? Not every single interaction needs to be durable immediately, and if you can get rid of that durability for every single commit, you can go a lot faster.

\n\n

Eric Anderson:
\nMaybe I shouldn't ask this question. What does this coding directly to relations mean? Is this like a graph-like idea or to map it with terms that I've explored before?

\n\n

Matt Wonlaw:
\nYeah, so I guess if you think of Spotify, you have your tracks, you have your Play button, you have your timeline of what's currently where you are in the track. Somehow you have to get the data to those components, essentially the Play state. So when you press Play, all the things can update and show the correct state. And if you're using an object model, maybe you're passing that down to all the components. But if you're coding directly against some relational DB where everything's flat and laid out, like this global data store where every component can just issue a query for the data it needs, so literally just issuing some query for the data it needs and then subscribing to that query. And I guess coding to relations has less of this perspective problem of that chair example of if I need to find something that's firewood, well, I can just query for anything that's made of wood and chairs will come back and that's fine.

\n\n

Eric Anderson:
\nMaybe it's worth pointing out. One of the things that's interesting about what you're doing, I think it's kind of orthogonal to the way a lot of the software is developed today, which is individuals have a specific problem in a company, they build some software to solve that specific problem and then they explore whether it can be generalized to a lot of people. And I think you've separated yourself from a specific application and said, "Here's a general problem. Can I come up with a general solution and then explore whether it can apply to certain people's specific problems?" Is that fair to say you're taking the reverse approach to say Uber open source or Google open source?

\n\n

Matt Wonlaw:
\nI started... So I guess I'm pulling on my, been doing software development for 15 years and I've noticed these recurring patterns and all sorts of systems we've had to build. So I guess it's pulling on this history of experience in these problems I know I keep dealing with and thinking through why does state management keep coming up as a problem? Why is model view controller always turning into a mess eventually? And I do have this old project I used to use, it's like to validate and test my ideas, Strut.io, it's like this old presentation editor. Yeah, there's some grounding in reality and use case.

\n\n

Eric Anderson:
\nOkay, so you went down the SQLite journey, built CR-SQLite, Fly got excited, but over time you came to the conclusion that maybe SQLite isn't the right abstraction. Have you come to similar conclusions for your other projects?

\n\n

Matt Wonlaw:
\nSo I guess the short answer is not really. I think the other projects still seem like they're bearing a lot of fruit. SQLite, I had reservations at the beginning, but the old adage, "Don't write your own database," and all these things, coming from Facebook where they're very pragmatic about how they do things. I was like, "Yeah, that culture was instilled in me, so let me make the pragmatic choice here. Let me pull SQLite off the shelf and see how far I can get with it." And then just after a while it was like, "Yeah, I don't think it's going to get all the way to where I want to get it to." And I think part of that is I've been targeting the web platform specifically and I just didn't realize how much of a tax there is crossing from JavaScript to WASM and back and there's a huge performance hit just in that. And maybe if that performance hit wasn't there, I could still continue with SQLite, I'm not sure.

\n\n

Eric Anderson:
\nDo you come out the other door, if SQLite isn't the answer, is there something else you lean towards?

\n\n

Matt Wonlaw:
\nWhat I'm leaning towards is just fully investing my time in Materialite and building that up to just not just be incremental view maintenance, but also a regular database under the hood. Yeah, I am leaning into, yeah, I guess, I will build my own database. The upside of that is Materialite actually has the most interest from people financially, so it'd make sense for me to do that too.

\n\n

Eric Anderson:
\nI've noticed a little database trend of late where folks seem to not build databases from scratch in the form factor we're used to as much anymore as they borrow this from Postgres or that from ClickHouse. And that database ends up being a federation of database components then like a single monolith. And I'm curious as you approach Materialite, is that going to bear out or...

\n\n

Matt Wonlaw:
\nYeah, I think so. So that's also one of the things that finally pushed me over the edge to maybe I can build my own. I don't know. I feel so arrogant saying this, maybe I can build my own database, but I guess that's why we do hard things, right? We do 'cause we don't know they're hard. But there's a guy at Triplet, I think their founder, he's like, "Yeah, building database probably isn't that hard these days 'cause we have all these key-value stores in B-tree implementations, you can just pull stuff," as you said, "off the shelf." So yeah, that's what I've been doing and the browser has IndexedDB, which is pretty decent abstraction for a key-value store to base the database off of, so I don't have to solve the writing to disk and the durability and all these things. I just really am dealing with that key-value interface. Everything below that is already solved for me.

\n\n

Eric Anderson:
\nIf Materialite goes the way you want, is this a database in the sense of a local database like a SQLite or is this a distributed database, I don't need Postgres anymore?

\n\n

Matt Wonlaw:
\nIt'll be a locally embedded database like SQLite, but ideally with collaborative features, right? I think SQLite was built at a time where everybody just had one computer, but now you have multiple devices and you might start work on your phone and continue it on your laptop and then finish it later back on your phone and you want that data obviously to sync between all of them. So I think, yeah, it'd be really cool if an embedded database of the future knew how to sync or merge itself with other copies of itself, so you do-

\n\n

Eric Anderson:
\nGot it.

\n\n

Matt Wonlaw:
\n... 'Cause I guess going back to Strut.io, when I first quit, I was like, "Oh, I think I can resurrect this old project and make a bunch of money real quick with it." But that project didn't have multiplayer or syncing, so I was like, "Okay, let me just go add that real quick." And I was like, "Oh, this is a mess and that's why I think there's a market here with CR-SQLite." And that's how I started building CR-SQLite, "I think there's a market, let me just go build that general solution." To all that say, syncing is hard, it would be great if something like at the database layer could handle that for developers, so they don't have to be experts in distributed systems.

\n\n

Eric Anderson:
\nSo I guess this transition, if I'm imagining that is what's happening between your focus on CR-SQLite to Materialite reinforces the fact that Vulcan Labs is the thing that's persistent and continuous. That's where you're being consistent and then you move the locus of attention from one project to the other as needs arise, as conclusions form. Is that fair?

\n\n

Matt Wonlaw:
\nYeah, I guess I started, it was really just Vulcan and it was all going to be a product around CR-SQLite and then I realized I really enjoy solving the difficult problems and not so much the finding collaborators, maybe finding funding, maybe hiring people, the whole business side building. So I was like, "Okay, pivoting it as a research lab and doing contract work for people who also have these problems." It fits my interest and two, it provides some financial stability.

\n\n

Eric Anderson:
\nMaybe one question going back to this object-oriented as the big culprit and you mentioned relations to the model, and I think I got that with respect to data modeling and your database. Does that also mean you would avoid objects in more just general programming, like classes are problematic? What's the alternative? Is it functional or is there some other approach?

\n\n

Matt Wonlaw:
\nYeah, I don't think classes are altogether bad, right? I think you still need classes and objects and all these things for...

\n\n

Eric Anderson:
\nAbstraction and encapsulation maybe.

\n\n

Matt Wonlaw:
\nYeah. But I'm trying to think of the best way to, there is a separation somewhere, there's like the data for the problem domain you're modeling, and I feel like that should be relational. And then there's the data for your algorithms, maybe your B-tree or your breadth first traversal or whatever these things are. These can be classes and hide their data and have interfaces and whatnot. But the stuff that's part of the problem domain feels like it should be stored in a relational DB and accessed directly through queries.

\n\n

I guess one other thing that the object model or program memory is missing is good primitives for doing mutations. So if you think about a database, you can start a transaction, you can mutate five things. If you fail in the middle, well, they all get rolled back. They think about your program memory. If you start a bunch of mutations and then you have an exception in the middle, you don't really have a facility to roll back. A few languages have software transactional memory, but not many. And I think that's another big problem as to like... Like people always say, "Oh, global state is a horrible thing," but our databases are shared global mutable state, and we generally don't have that many problems with them. And I think the primitives in our languages are just, we don't have the right mutation primitives, we can't do a transaction, we can't get isolation of transactions, we can't roll things back.

\n\n

Yeah, I should be able to say, "I want to mutate these five variables, but nobody should be able to see the effects of the mutation until I'm done. And if I fail in the middle, I should be able to revert to the prior state." But yeah, we don't have that in JavaScript or C or whatever.

\n\n

Eric Anderson:
\nInteresting, and maybe we don't have those things because the database has done that for us in the past, like we've just said the database is the record of truth for this data and we're going to outsource mutation management to the database. And now as we want to do more of that state management within our programming more directly, do you envision that emerging? Is this the fifth project for Vulcan lab?

\n\n

Matt Wonlaw:
\nSo I think in the early 2000s, you're right, when pages were, like the LAMP stack and everything was a full page refresh, all your state was in the database and that was the system of record and you had no in-memory state. You had to manage and roll back and do all these things with. I think things were pretty easy back then. And I guess to answer your other question, would I ever do software transactional memory? So I know Clojure has it, they've had it for a really long time. JavaScript, the environment seems so impure, it seems like super hard to pull that off. I don't know if you could pull that off in user space.

\n\n

Eric Anderson:
\nSo Matt, maybe we can just touch a bit more on the Vulcan Labs model 'cause I think it's an interesting model. So you went into this thinking, "I'm just going to create value, solve big problems, and if people want to fund me, great and if not, we'll figure it out." And people did want to fund you. How does that happen? What are the mechanics 'cause I think that's interesting to some people? They just send you dollar bills in the mail?

\n\n

Matt Wonlaw:
\nYeah, I wish it was that easy. Like okay, some people send you dollar bills in the mail, but they're literally like $5 a month.

\n\n

Eric Anderson:
\nRight.

\n\n

Matt Wonlaw:
\nTo get worthwhile sponsorships, yeah, I was just actually talking with the cut, so finding out which companies were using this stuff. Usually they were talking to me in Discord 'cause they needed support and then finding out who are they, okay, they're part of a company and then just literally having that conversation like, "It looks like this powers one of your products. I'm doing a lot of bug fixes and maintaining things for you. I think it would be fair if you guys contribute a significant amount to this." Yeah, one, so Johannes, he's a guy I talked to. He encouraged me to say, "Oh, are you getting one engineer's salary worth of value out of this? That's how much you should contribute." I never went quite that aggressive. So I went somewhere between $0 to that aggressive, yeah just having those conversations, being upfront that they're getting value and what's the worst that can happen? They can say no, but they're probably still going to use your project, they're not just going to drop it.

\n\n

I did wait, right? I didn't ask for these donations immediately. I waited till the project was kind of in use in some capacity at their company before asking, which it feels a bit sleazy like, "Oh, I waited until you deployed to production, now I'm asking you for money." But I think everybody understands that you got to make a living somehow.

\n\n

Eric Anderson:
\nYeah, I don't think it's terrible to deliver value and then ask for money. It'd be almost, the reverse seems also equally strange to be like, "I think you should pay me and just hope that I give you something interesting in the end."

\n\n

Matt Wonlaw:
\nThat's true.

\n\n

Eric Anderson:
\nAnd these people are accommodating. And are there strings attached to that money? Do you implicitly need to serve their needs or do they ask for deliverables?

\n\n

Matt Wonlaw:
\nYeah, it's different for different companies. Some are no-strings attached. It's like, "We're using it, we know how to use it. Whatever support you can provide is fine, but we're not holding you to anything." There have been others that like, "Oh, we want to give you equity and make sure you're aligned with our company's success and sign this contract." Those, I just have to turn down because-

\n\n

Eric Anderson:
\nOkay, yeah.

\n\n

Matt Wonlaw:
\n... I feel like if I'm going to start tying myself to one specific sponsor, then that might preclude others in the future. And now I'm suddenly worried about what I say publicly or work on in the future, right? For those, I just say, "Yeah, I can't do that. You can just write a sponsorship check or nothing at all." I don't know if you've talked to others in the past who get open source sponsorships and if you have any insights into what other people have done?

\n\n

Eric Anderson:
\nGitHub now facilitates some donations and I think Patreon is another kind of channel that I've seen people use. In the case of maybe both of those, there's an option to do a reacquiring, like, "I'm going to subscribe to a donation. I am going to give you 10 bucks a month," which is an interesting model. We've all seen the phrasing around Buy Me A Coffee, which is kind of a fun, it seems like if Matt was in person, I definitely would buy him a coffee. I imagine in most cases your sponsors are buying you much more than that, but do you incorporate as a nonprofit, are you like Mozilla Foundation in a small way?

\n\n

Matt Wonlaw:
\nYeah, these are very good questions. I should talk to somebody who deals with this to give you some advice, but right now, I'm just set up as an LLC. But yeah, I don't know, could I set up as a nonprofit, get better tax benefits or... Yeah, I know there's research tax credits and stuff.

\n\n

Eric Anderson:
\nGood. So Matt, I want to pause here 'cause we're approaching end of time. Anything you wanted to cover or that would be good to cover?

\n\n

Matt Wonlaw:
\nI think the last thing that's interesting to me is you have an embedded database, which has some set of data on it, but realistically, you probably have... I guess, people have all sorts of different devices with different storage. You have your phone and then your laptop and desktop. So the amount of data in your app on your phone is probably going to be a sliver of what might be on your desktop. So it is really interesting if like can you create an embedded replicated database where on the phone it just has the hot set that the phone is using, but on the desktop it has the full set and can you do queries that span both databases where like, "Oh, it'll query my data locally and then if there's more data, it'll go back to the server or whatever and pull in the rest."

\n\n

Eric Anderson:
\nMaybe jamming on that a little bit more broadly. I don't think this is anything new, but I novel all the time when I look through my apps on my phone that I have gigabytes of app on my phone, but very little of my user data. People build apps, they ship them to my phone and then on my phone, I generate data and I ship it to their server and they store my data and I store their app, and I find that a little odd. I feel like I should keep my data, you keep your app. Somehow that's not how mobile is built today. You're describing something that's more nuanced, which is how has this user data persisted and some of it's stored on my client, on desktop we're okay with lots of it being stored and on mobile, you want a little more flexibility to only store what's needed in a critical sense.

\n\n

Matt Wonlaw:
\nYeah, so on mobile, only store what's needed, but how can we hide that complexity from the developer where they're not having to figure out what to store or not. Maybe as they query the database, it sees what you're querying and it pulls in what you need locally and anything you didn't query, stays on the server or desktop, wherever your user data is. 'Cause I guess the Local-First Movement, there's this whole movement called Local-First where it's like users should have all their data on device and all their user data there, but all the solutions in the Local-First space require you syncing all of your data to the app before you can start using it.

\n\n

And some people in that space say, "Oh, you're never going to generate enough user data to fill your phone. Phones are huge now." Maybe that's true, maybe I'm barking up the wrong tree, but I feel like you do need the ability to start your app with just the slice of user data you need at that moment and as you need it, start syncing it in from, I don't know, your desktop or maybe you have some service provider that you do let you store your aggregate of user data.

\n\n

Eric Anderson:
\nI feel like Local-First is often useful for single player applications and I just wonder are they just as useful for these multiplayer applications? In this sense, you bring the whole enterprise's data onto your device?

\n\n

Matt Wonlaw:
\nYeah, so that's where these Local-First solutions have sort of been breaking it down, right? They're good for single player, they're good for multiplayer up to an extent. But then once you get, I don't know, all of Boeing or Lockheed Martin or something, that's a whole enterprise, you're not going to sink down the whole enterprise Wiki and enterprise task tracker and whatever else there is. So yeah, I think in that case where you do have enterprise software where you want to be snappy and fast, where all the data that your employees are using are local on your device, you do need to somehow figure out how to just sync a subset of the enterprise, just what that user commonly uses day-to-day is synced to their device and local for them.

\n\n

Eric Anderson:
\nMaybe I am asking you for a hot-take here. If you're spending less time in SQLite at the moment, have you come across PGlite, this kind of like running Postgres on WASM on the client? Is that a thing that we should be keeping an eye on?

\n\n

Matt Wonlaw:
\nI've seen it. It looks really cool. I was super blown away. They got the WASM size down like 2.6 megs after compression. I'm like, "Postgres, is that small?" I had no idea. I don't know if they've gripped stuff out. Yeah, I'm super curious. I want to see some benchmarks. Yeah, it's the ElectricSQL, folks. I think Neon working on it, so I know them pretty well. But also I have enough going on, I don't want to distract myself with a new rabbit hole, so I think I'll probably just get a TLDR and try to just file that away for some future problem if it ever comes up.

\n\n

Eric Anderson:
\nWell, Matt, I think we're also maybe in kind of conclusion here. Glad not only is your son the beneficiary of you going rogue here with Vulcan Labs, but I think all the world, rather than there being more code in Facebook, we have more code in public and you're solving generalized problems that all programmers face. So thanks for your contributions. I'm excited to see where it takes you.

\n\n

Matt Wonlaw:
\nYeah, thanks for having me.

\n\n

Eric Anderson:
\nYou can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.

","summary":"","date_published":"2024-04-10T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/a24d5e8a-4381-444e-b703-3be8d0edb0cf.mp3","mime_type":"audio/mpeg","size_in_bytes":30755153,"duration_in_seconds":1918}]},{"id":"f43fa63d-f29e-432d-99c9-147089068199","title":"Secret Sauce: Amplication with Yuval Hazaz","url":"https://www.contributor.fyi/amplication","content_text":"Amplication is an open-source development platform for scalable and secure Node.js applications. It allows engineers to skip writing boilerplate code and offers the flexibility to customize and add components. Amplification was created by Yuval Hazaz (@Yuvalhazaz1), a veteran developer who determined that low-code platforms save time but restrict freedom. Instead, Amplication uses code generation to reliably and consistently build robust production‑ready backend services.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n Yuval’s “secret sauce” for building an open-source community\n\n How platform engineers can use Amplication for company-wide standardization\n\n A baseline organic growth rate for open-source projects\n\n The role of generative AI in code modernization\n\n\n\nLinks:\n\n\n Amplication\n\n","content_html":"

Amplication is an open-source development platform for scalable and secure Node.js applications. It allows engineers to skip writing boilerplate code and offers the flexibility to customize and add components. Amplification was created by Yuval Hazaz (@Yuvalhazaz1), a veteran developer who determined that low-code platforms save time but restrict freedom. Instead, Amplication uses code generation to reliably and consistently build robust production‑ready backend services.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2024-03-27T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/f43fa63d-f29e-432d-99c9-147089068199.mp3","mime_type":"audio/mpeg","size_in_bytes":30188818,"duration_in_seconds":1882}]},{"id":"2fe87a38-ab88-481d-8314-e552297203c0","title":"To the Moon: OpenBB with Didier Lopes","url":"https://www.contributor.fyi/openbb","content_text":"OpenBB is an open-source investment research platform created by Didier Lopes (@didier_lopes). OpenBB grew out of a project called Gamestonk Terminal that Didier began working on shortly before the Gamestop short squeeze in January 2021. Today, OpenBB has evolved into an infrastructure platform that allows users to build extensions and access financial data with automation and customization.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n What Vice Media got wrong about OpenBB\n\n Some major contributors to the project and the features or directions that they proposed\n\n How a machine learning engineer from Bloomberg reached out about OpenBB\n\n Different types of OpenBB users – students, retail investors, and other financial professionals\n\n OpenBB’s exciting AI roadmap\n\n\n\nLinks:\n\n\n OpenBB\n\n\n\nPeople mentioned:\n\n\n James Maslek (@jmaslek11\n\n Artem Veremey (@artemvv)\n\n","content_html":"

OpenBB is an open-source investment research platform created by Didier Lopes (@didier_lopes). OpenBB grew out of a project called Gamestonk Terminal that Didier began working on shortly before the Gamestop short squeeze in January 2021. Today, OpenBB has evolved into an infrastructure platform that allows users to build extensions and access financial data with automation and customization.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2024-03-13T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/2fe87a38-ab88-481d-8314-e552297203c0.mp3","mime_type":"audio/mpeg","size_in_bytes":38176018,"duration_in_seconds":2381}]},{"id":"ec9b17e2-e507-4096-94a4-5e806f6c90f2","title":"Robust Observability: OpenTelemetry with Austin Parker","url":"https://www.contributor.fyi/opentelemetry","content_text":"OpenTelemetry is an open-source observability framework for collecting and managing telemetry data. OpenTelemetry has been more successful than expected, becoming the second fastest growing project in the CNCF. It allows for flexibility and avoids vendor lock-in, making it attractive to startups and large enterprises alike. On today’s show, Eric (@ericmander) sits down with Austin Parker (@austinlparker), director of open-source at Honeycomb.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n How Austin’s interest in complex systems led him to the observability field and developer relations\n\n An X argument that contributed to the merger of OpenTelemetry and OpenCensus\n\n Why foundations help maintainers to strike a balance with their contributors\n\n Austin’s opinion on the secret to OpenTelemetry’s success\n\n\n\nLinks:\n\n\n OpenTelemetry\n\n Honeycomb\n\n\n\nPeople mentioned:\n\n\n Charity Majors (@mipsytipsy)\n\n Christine Yen (@cyen)\n\n\n\nAustin Parker:\nThere's only so much you can do as a maintainer. There's only so much you can do as a founding member of a project versus what all the people that are going to come in with their own ideas and their own projects and their own success criteria too. How do you protect your time as a maintainer and how do you protect the project roadmap? There's all these questions that there's no manual for.\n\nEric Anderson:\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Today, we're here with Austin Parker, who is director of open source at Honeycomb. Honeycomb's in the scale portfolio where I work, and so I get to work with Austin and his team quite a bit, but this is the first time Austin and I are talking.\n\nAustin Parker:\nYes, that's true.\n\nEric Anderson:\nThanks for coming on the show. Good to meet you.\n\nAustin Parker:\nThanks to having me. It's great to be here. I recently joined Honeycomb a couple months ago, but I've been an external admirer of the team over there for quite a while and I've known Charity and Christine and all of them, and they're all fantastic humans, which is one of the biggest motivating factors in moving to join them.\nI've been doing observability stuff for five, six years now, and when it was time for me to figure out the next step, I wanted to be around people that I felt I shared a lot of values with, but also that we're really interested in being part of this bigger open source story in the observability world. And that's something that Honeycomb was really committed to. So I was very happy to come on board and see that take into its next level and really redouble our commitment to open source and OpenTelemetry.\n\nEric Anderson:\nAnd as an investor, we are very happy to have you on board, Austin. OpenTelemetry, I think is underappreciated as a technology as a community, and Honeycomb I think is embracing or making OpenTelemetry a more important part of the story with time.\nAnd so you're kind of in the middle of some awesome stuff. How did you get there and how did you get into, I don't know if everyone knows the path into developer relations even? Was that where you planned on going?\n\nAustin Parker:\nIt's quite a twisted path. In a lot of ways, my journey into developer relations is my journey into software in general, because like a lot of people, back when I was 18, 19, in the early 2000s, there was definitely this like, ah, you can just go, don't need to go to college. You can just get a job in IT, and you can work with computers very... Without having to kind of have done the whole computer science thing.\nThat was true for the most part. I was able to get into it and IT was really interesting stuff to me just thinking about how these really complex systems worked and the actual computers that made them up. One of my earliest jobs when I was 15 years old was as the webmaster for my hometown newspaper, and the web server was a Mac Quadra just sitting under a desk in the second floor of the newsroom.\nWhat I would do every day after school is I would come in, I would fire up a copy of BBEdit, I would take the stories that have been put into QuarkXPress and copy the text, put it into HTML files and then just move them over to the server. Just open a shared folder and just drop them in, and there you go. The new paper was done, and it was really, really wild to think about I am doing this...\nI am giving this information to people, and how do they all get into that machine? How do they all get into that server sitting under there? And this was very early days of the web, mid-90s, and over time that really stuck with me is like, how does this stuff all work? And it drove this interest of systems thinking that inspired me to keep diving into it.\nThen later in life, I decided I wanted to go back to school, get the computer science stuff done and move into software ,because IT is great, tons of respect to people that do that as a career, but I wanted a little bit more. And so I went, did all that, got computer science, informatics, the whole nine yards.\nAnd coming into college as a 26-year-old is very different than trying to do it at 18, the way you see things and the way you deal with all those systems is very different. But because of that, I was able to get into a little software startup that at the time was, it's a company that is no longer with us called Aprenda, but they very early, very early on embraced Kubernetes, and this is back zero point whatever for Kubernetes, super early days.\nThe whole idea of cloud native was being defined around us. As I got into working with these very large complex on-demand platforms, these cloud platforms, it's like you can see those systems again and the complexity of those systems and how to actually understand them. And I got really tired, honestly, of sitting there trying to figure out what was breaking by Grepping through a bunch of logs and looking at all this disconnected telemetry data that we had.\nThat led me into a company called Lightstep. And Lightstep was a supporter of a project called OpenTracing, and one thing led to another and that's how we got to OpenTelemetry. But while I was at Lightstep, that's where I shifted from engineering into developer relations. And a lot of that was necessity, but a lot of it was also just interest. I've had a lot of time to work and explore options outside of tech and outside of IT.\nI've been an adjunct professor at a college before. I've done community theater, 40 billion other things. The idea of teaching people about this technology and helping them understand it and being able to be a communicator was really interesting to me. So, the idea of like, oh, there's a job where you can do that, cool, let me go do that, came very naturally.\n\nEric Anderson:\nAnd sometimes that's the role you have to adopt when you are the creator or early maintainer of an open source project. There's a certain level of evangelism that is perhaps thrust on you. The community kind of is like, how does this work? What do I do? And suddenly you're maintaining a community.\n\nAustin Parker:\nI mean, nobody thinks about the website. If you're out there building the next great open source, whatever, it's really easy to get caught up on like, oh, I just need the tech, because there's a lot of that. There's a lot you have to go through, and it's not a quick process and it's not a clean process to actually build the technology.\nBut as you get users and as this community grows around you, and if you have a good idea, you will grow a community. The difference between I think between successful projects and unsuccessful ones is how well they nurture that community. Those are the people that are going to take it and run with it.\nThere's only so much you can do as a maintainer. There's only so much you can do as a founding member of a project versus what all the people that are going to come in with their own ideas and their own projects and their own success criteria too.\nSo how do you both nurture that and how do you make sure that people that are coming in with their own ideas don't feel like they're just running into brick walls over and over trying to do something different than you want to? How do you protect your time as a maintainer and how do you protect the project roadmap?\nThere's all these questions that there's no manual for. There really isn't even a formal mentorship program for. We don't really think about, we don't teach our open source maintainers this balancing act. So either you're fortunate enough to have that sort of evangelism bone in your body or you find those people and you let them run with it or you don't.\nI do think this is one of the advantages of foundations though, like the CNCF, because they can help provide some of that community muscle and some of that organizing and they have people to do that part for you if you're not good at it.\n\nEric Anderson:\nSo in your case, the projects at hand here weren't necessarily OpenTelemetry at first, but OpenTracing, OpenCensus, although I think you were on the tracing side. The merger of these two projects is unique to me. Maybe this is common in opensource land and we don't see it, but how did that come together? And I guess we probably need to address where OpenTracing came from.\n\nAustin Parker:\nI will point out generally, actually, this doesn't happen. There's a lot. The odds were against us. So for context, OpenTracing was a CNCF project that was run by a coalition of, well, at the time was this very nascent idea of distributed tracing as a observability concept. People were familiar with logs, people were familiar with metrics, but distributed tracing didn't have quite the cachet that it has today.\nI think almost 10 years ago now, when this was coming up, it had been used at Google, it had been used at your Metas and your very large scale enterprises, software-based enterprises, for quite a while. It solves this really important problem of how do I understand a request path through a distributed system. And people have various forms of this using correlation, log correlation and whatnot, but distributed tracing is, \"Hey, let's apply a model to it. Let's build an ecosystem around it.\"\nSo OpenTracing was supported by Lightstep. They are now part of ServiceNow, but at the time was an independent company. Some engineers from Uber who worked on a project that still exists called Yeager, which is a trace visualizer, some Zipkin maintainers. Twitter pre acquisition was a huge user of distributed tracing. A lot of the people internally there built out a lot of the stuff that we still use today conceptually.\nSo there were all these people and they were working on defining a open standard for just a tracing API with the idea of, hey, if we have a standardized API for this, then everyone can rely on that. And then different observability vendors can implement that API for their particular tool. Concomitantly with some of this, teams at Google and Microsoft were working on something called OpenCensus, which had very similar aims but slightly different implementations details.\nSo OpenCensus included not just an API but a SDK to actually let you use all this. It had more affordances for a larger ecosystem of tools around it. Now, in the public eye, these were very similar projects. They both accomplished mostly the same thing.\nThat was kind of silly. We would have open source developers or library maintainers come up and be like, \"Well, I have users that want to add... They want tracing in my library, but I don't know what I should do. I don't know, should I use OpenTracing? Should I use OpenCensus? Because you two are not compatible with each other and there doesn't seem to be a real alignment in the community around which of these things is going to win.\"\nAnd what we saw happening was people would basically say, \"I don't want to make a decision. I'm just going to wait and one of these things is going to go away and the other is going to succeed,\" and that makes my decision for me. And there was a time, and this was around the end of 2018, October-ish, I want to say, where I got into a Twitter argument with some people about this specific topic of OpenTracing versus OpenCensus.\nSome of us saying, \"This is silly, why are we arguing about this?\" Because we all want the same thing. I took that back to the other OpenTracing maintainers and to other kind of people in the community and we reached out to the CNCF and said, \"Hey, can we get someone to... Let's figure this out. Let's get a mediator in here. Let's get a small group together to talk about this, figure out the feasibility, and if it is feasible, let's merge these things.\" Because it is wild that there's these two projects that both are basically doing the same thing and it's harming the overall observability community by there not being a single answer.\n\nEric Anderson:\nLet me get this straight, Austin, the creators probably had some vested interests in seeing their project continue and maybe be the winner. And so maybe both projects were inclined to not fight it out, but continue in hopes that they could end up on top.\n\nAustin Parker:\nIt's very easy to overlook structural versus individual reward semantics when we talk about open source. So at the time, the good thing is that I would say neither of these projects were actually stunningly successful. They were both standing on their own two legs.\nBut in the case of OpenTracing, there were decisions that had been made early on in the project that weren't really panning out. There were things that we knew. It's like if we could have done it all over again, what will we do differently? That's the point we were at. And I think on this OpenCensus side, they were seeing this similar, we're spending all this effort trying to go against this other thing that's mostly duplicative of what we're doing and we're wasting time on this.\nSo there were a lot of systemic barriers I would say, to the idea of like, oh, we have all this invested already, why change? But I don't want to dismiss. In this specific case, it was really gown do like, \"Hey, the right people were in the room.\"\n\nEric Anderson:\nThis idea of power brokers coming together and it's just awesome.\n\nAustin Parker:\nI wouldn't even say power brokers. I mean the way I joked about it, there's maybe 50 to a hundred people in the world that really, really care about this stuff. And it just so happens that enough of them were part of these small groups that we were able to make progress.\nBut in a lot of cases, it was like the in question, were able to see past the systemic incentives of the competition and do what was right for the community writ large, do what was right for the observability world, writ large. And I think looking back, because in 2023, we hit five years since the initial conversations, and I've spent a lot of time thinking back about it.\nIt's been more successful than I could have ever imagined initially when this originally happened. The idea that we've gone from these two fairly niche things, duking it out metaphorically with each other to becoming the second-fastest growing project in the CNCF. We're almost as big as Kubernetes when you look in terms of contributions and pull request activities and commits and all this other stuff.\nThat's massive over a very short amount of time. And I credit that to yes, those initial group of people, most of which those core contributors are still here with us and they're still involved in the project, but also to this really amazing community that we've been fortunate enough to build and how they have come along with us on this journey.\n\nEric Anderson:\nSo let's now talk about what has OpenTelemetry now become, and it plays a unique role actually in the ecosystem, an increasingly important one seems.\n\nAustin Parker:\nI would agree. The idea behind OpenTelemetry is pretty fun, is actually fairly straightforward to explain. It is the idea that if you're running any kind of software system, any kind of computer system, cloud native, whatever, you need to understand what's going on in that system in order to fix bugs, in order to understand how it's performing in production, in order to make improvements, do whatever.\nYou need data about it, you need telemetry data. And in order to really, as our systems have gotten more complex, the needs of that telemetry, that what we want that telemetry to do has also gotten more complex. And to really support modern and next generation cloud native workloads, we need a new way of thinking about that telemetry data. We need standards for it. We need standard ways to not only talk about it using our words, but standard ways to communicate it from cloud provider to cloud provider or from software to observability system.\nWe need common nouns and verbs. We need common metadata on our telemetry so that any given HTTP server or any given cloud platform is going to speak the same language in terms of what is a host name, what is an IP address? And that is what OpenTelemetry does. It is effectively an open standards project for creating telemetry data for cloud native systems.\nThe goal of OpenTelemetry is to make that telemetry a built-in feature of cloud native systems. So our vision of the future is that you'll install your express HTTP server or your React framework or whatever it is you're using, and you'll just get this really rich stream of telemetry data, metrics, logs, traces, whatever, that you can then transform and send to any backend, any front end that you want to visualize and understand that data to turn that telemetry into observability.\n\nEric Anderson:\nNow, I think about this as an investor in terms of business terms, for the longest time, this is a big market observability or metrics or monitoring, Splunk, Datadog, New Relic are big companies. If I understand historically, part of the technology and business strategy has been that you publish your own SDKs and other collecting code that gets embedded in people's applications.\nSo switching off of New Relic or something requires changing this code, and OpenTelemetry maybe provides a world in which you can add OpenTelemetry instrument in your application and then use any kind of backend service.\n\nAustin Parker:\nYeah, that's certainly the objective. And in a lot of cases, that's where we're at today. If you're running a Java or .Net application, there are drop-in agents and libraries that you can use to get your critical application telemetry data out of there and send it to over 40 or 50 different observability backends including Honeycomb.\nWhat I think is really cool about this and something that you're starting to see more of is this is really increasing the amount of innovation in the observability sphere because traditionally if you wanted to make an observability tool, then you had to overcome that hurdle of how do I get the data? How do I get the data?\nThat's one of the reasons that your Datadogs and Splunks and New Relics and all of these have been so effective at keeping their marketing growing is that they have all these integrations. But OpenTelemetry says, \"Well, what if that's no longer a, 'You must be this tall to ride kind of barrier anymore'?\"\nSo now we're starting to see very much shoots and leaves. We're not seeing a ton of stuff yet, but we're seeing a lot of new entrances into the market that are exploring really radically new ways of thinking about these problems. A lot of them actually end up looking like Honeycomb. The idea is that we've been building for years around how you should think about observability and what your tools should look like.\nWe're starting to see echoes of that in newer entrants, which is really interesting. I think it's a really great validation, honestly, about our strategy and about how we approach this problem, and I think it provides a very disruptive moment for existing players in this, because historically they've worked on this older model, where you install their agent.\nYou're locked into their ecosystem and that's how they bill you and that's how all this stuff ends up working out, what happens when everyone is on open to telemetry and you can't just have people install your agent anymore? It's an interesting question.\n\nEric Anderson:\nSo I think historically people would choose their backend service and then they would apply the necessary agents and things. I'm curious about the behavior of large enterprises, are they now just adding OpenTelemetry and then deciding later which kind of monitoring services they work with?\n\nAustin Parker:\nThat's a good question. I've had two interesting conversations about this pretty recently. One is from a very smart, very small startup, they're launching a SaaS application. And they were trying to figure out what do we do for observability?\nThe conclusion they came to was, we are not using any of these existing players because of expense and cost and we don't want to be locked into that. What they decided was if we build around OpenTelemetry, then we preserve our optionality going forward. And for now, we can use what we get for free. Not for free, but what we get with our cloud provider.\nBecause what you're seeing, our cloud providers are also standardizing on this, like Azure, Google Cloud and AM AWS are all adopting OpenTelemetry. So if you're building on those clouds with OpenTelemetry, your stuff is now compatible with their stuff. And as you grow, you have the option to be like, I need to graduate from this into something bigger, something better, and I can do that. That's what this small team, that was their conclusion right after looking at pretty exhaustively at both current large, mid and smaller entrance into the observability sphere.\nThe other conversation I had was with the head of platform engineering at a very, very large financial services company. And the way they're thinking about OpenTelemetry is really the same way that they were thinking about Kubernetes several years ago, where they know what they have today and they know this massive, massive cost of monitoring and understanding their existing systems. But they also know that they are still going through a cloud transformation. They'll be doing it for a while.\nAnd for that cloud transformation, for that new cloud platform they're building, Kubernetes is the center of it. They might not be running pure Kubernetes everywhere. They might be running some mixture of OpenShift and Elastic Kubernetes and Google Cloud and dah, dah, dah, dah. There's a lot of ways you can run Kubernetes, but the Kubernetes API, like the idea of Kubernetes, sits the center of this platform.\nIn much the same way that Kubernetes is the center of the orchestration and the way you're running applications, OpenTelemetry is the way you're understanding those applications, monitoring and monitoring those clusters, all of the different things, plug into OpenTelemetry is the center of the universe. And then you have, again, options.\nA lot of them are saying like, look, it's not, it's cost effective for us to roll our own observability stack. We have economies of scale that you don't, or we want to do a hybrid approach, where we have our data lake, our data warehouse over here, and then we have some stuff that goes into specific tools because it's important.\nA really interesting example of this is Slack. I think Intuit also does some stuff like this, but they build through this system where they have all of their observability data in manage data sources that they manage, but then they can tail certain parts off of it to other platforms, other observability tools based on how important it is. OpenTelemetry is the center of that too, because not just creating the data, it's collecting it, doing data pipelines, so on and so forth.\n\nEric Anderson:\nSo how does OpenTelemetry work? If we go back to the old model where my vendor provided me a bunch of agents and SDKs, is there a risk that OpenTelemetry doesn't address all the ways of collecting?\n\nAustin Parker:\nSo at a really fundamental level, OpenTelemetry provides this full tool chain for creating, exporting, collecting, and transforming telemetry data. So that includes an API that can be bundled as part of a library or a framework, an SDK that lets you create and export that telemetry, and a tool called the Collector, that's a Swiss army knife. It lets you get in telemetry from OpenTelemetry SDKs.\nIt also lets you scrape logs that are in files or listen for stats D or Prometheus metrics, and then it transforms them and sends them somewhere else. So within that, you have this huge range of options. It's designed to work with what you have today. If you're using some combination of stats C metrics and logs, cool. You can drop the Open Symmetry collector in, set up some rules and now send that out to wherever.\nIf you are using a fully proprietary stack, there are some agents in some ways to receive that data and transform it to OpenTelemetry, and it's all open source so you can go in and write your own too. We've seen a lot of that people coming in and saying like, \"Oh, well I have data in this format,\" and if there isn't already a receiver, they'll come in and contribute that back so that everyone can make use of it.\nThis is one of those things, where it's a rising tide that lifts all boats. Everyone benefits in the ecosystem from having more ways to get data in and translate it, because it makes that data more useful and it saves you as a developer, if you're a developer or you're an SRE, you don't have to think about, we can't change observability tools. Ours is too expensive. We're stuck with this forever.\nIt's like, well, OpenTelemetry makes that migration really painless because you can just use the collector, take the stuff you already have and shove it to the new place you want it to go. Also, really good for people that are, what is much more common, especially places that have been around a little longer, is you've got all these legacy services.\nSome of them are going to be emitting data in all sorts of different formats. You can go and you can write that yourself and plug it in really easily to translate that into a modern format and align it with your other new work so that you're getting the old stuff and the new stuff hand in hand and this braid of telemetry data rather than having a bunch of disconnected tools.\nIt's one of the things that you see a lot, I think, in the monitoring world now is developers that are trying to... You're on call. Something breaks. It's like, I got to go to this over here to see the alert, and that gives me a little bit of context, and now I got to go over here and try to find that in two or three or four or five other places. And maybe I don't even have access to the place that emitted from that the original problem started at.\nSo now, I need to page someone else or get them on Slack or Teams or whatever and have them go look in their two or three or four or five systems for the data. With OpenTelemetry, we're giving you all this as an interconnected interrelated stream, so your traces, your metrics, your logs, they're all connected to each other. They're all correlated with each other through the observability context that we provide.\nIf you put that into tools that can really make use of it, tools like Honeycomb, then it's a very transformative way of thinking about what does observability mean to me? How am I actually using this to debug incidents? How am I using this to do performance profiling?\nIt's an interesting part because it's like having the telemetry is great, but telemetry does not equal observability by itself, and just collecting a bunch of data doesn't give you any value. I think that's cool about Honeycomb, why I like being there is that I think we really do have a great way of looking at that data and helping you get value out of it.\nBut I also think that it really helps push the industry forward and helps say, \"Hey, it's not enough just to have everyone locked in and your agent or whatever to get the data. You actually have to do something with the data, and that's what's going to advance this field.\" More so than just focusing on collection or storage.\n\nEric Anderson:\nYou describe a narrative, which is interesting, that you could have this startup, for example, whose motivation is more like avoiding lock-in and maybe managing costs, adopting OpenTelemetry, instrumenting everywhere, and then if they choose something like Honeycomb, they end up in this world where actually we're doing observability better than we've ever done it before.\nI went into this journey with avoiding lock-in and lowering costs, and I ended up doing super advanced observability and getting a bunch of visibility in my systems, which is saving me time and money in different ways.\n\nAustin Parker:\nYeah, it's about prioritization. When you're at that new startup level with 10 people, you want to optimize for different things than when you get to 50 or a hundred or 200. That's the whole startup journey is just figuring out the right time to make different trade-offs, and I think OpenTelemetry gives you that flexibility regardless of your organization size, because even very large companies are going to go through those trade-offs.\nWe've seen this over the past couple years with the changes in macroeconomic conditions. A lot of people are doing a lot of belt tightening, and they're looking at every contract and they're looking at every penny of observability and monitoring spend and asking, \"What are we getting out of this?\"\nI want people to be able to not have to look at it and say, \"Well, if we lose this spend over here, then we literally don't know what's going on anymore because reliant on some proprietary agent.\" I think that's a bad corner to get backed into, and move to a point where they can say, \"Oh, well, we can be intentional about our choices in how we spend on observability and monitoring. We can be intentional about our investments in this.\n\"Can we do more custom telemetry? Can we do more in terms of sampling? Can we do more in terms of making really reactive responsive systems?\" But you have to have that telemetry data and it has to be open and it has to be vendor-agnostic, and that is when you get down to it. That's why OpenTelemetry is inevitable. At the heart of all of this is OpenTelemetry is solving a really big problem for a lot of people, which is why it's been as successful as it's been.\n\nEric Anderson:\nSo this successful project you mentioned, it has as many, I don't know if it was contributions or whatever metric it was relative to Kubernetes. Where do OTel people hang out? So it is part of the CNCF, which some people highly associate with Kubernetes, and so if you go to KubeCon, there's like an OTel Day.\n\nAustin Parker:\nSo we have a pretty broad community. I actually looked the other day and something like over the past five years, like three or 4,000 contribution, like different unique companies contributing, spanning everything from very small to Fortune 50. So not everyone is contributing an equal amount, obviously.\nThere's a core group of 10, 20, mostly companies that are involved in the space, your Splunks, your Honeycombs, your Lightsteps, that are doing a lot of the work, but everyone tends to come together online. We have weekly meetings for all of our SIGs. We have a lot of stuff gets done on GitHub, like most good open source projects.\nWe have Slack channels in the Cloud Native Computing Foundation Slack, and then twice a year at KubeCon in Europe and in North America, we try to have a lot of OpenTelemetry content at those. One thing we started doing at the end of last year in KubeCon in Chicago is we have this new thing called the OpenTelemetry Observatory, which is just a bigger sort of project booth. It's sponsored by Splunk, and it's a place to have happy people come together, whiteboard out stuff.\nThat's a real fun way for the community to see each other in person and get together and talk. We also have Observability Day at KubeCon. Actually, we're going to have another one here in mid-March in Paris, which is going to be actually been so successful. We've had to expand it to two tracks, so that'd be the first time that's happened.\nThere's just so many people that want to come and share their OpenTelemetry stories and their observability stories in the cloud native world that we had to get, well, it was more seats or more talks, and we went with more talks. So hopefully that'll go well, and we should also be having some specific community days this year as well. Hopefully, over the summer. The best way to find out about all this is to keep an eye on our website OpenTelemetry.io.\n\nEric Anderson:\nWell, I think you're in an envious position of you get paid to work on open source and a really cool project with an awesome team.\n\nAustin Parker:\nI'd like to think so.\n\nEric Anderson:\nThank you for joining us today.\n\nAustin Parker:\nThanks for having me. It was great to come on, a really fun conversation.\n\nEric Anderson:\nYou can subscribe to the podcast and check out our community Slack and Newsletter at Contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.","content_html":"

OpenTelemetry is an open-source observability framework for collecting and managing telemetry data. OpenTelemetry has been more successful than expected, becoming the second fastest growing project in the CNCF. It allows for flexibility and avoids vendor lock-in, making it attractive to startups and large enterprises alike. On today’s show, Eric (@ericmander) sits down with Austin Parker (@austinlparker), director of open-source at Honeycomb.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Austin Parker:
\nThere's only so much you can do as a maintainer. There's only so much you can do as a founding member of a project versus what all the people that are going to come in with their own ideas and their own projects and their own success criteria too. How do you protect your time as a maintainer and how do you protect the project roadmap? There's all these questions that there's no manual for.

\n\n

Eric Anderson:
\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Today, we're here with Austin Parker, who is director of open source at Honeycomb. Honeycomb's in the scale portfolio where I work, and so I get to work with Austin and his team quite a bit, but this is the first time Austin and I are talking.

\n\n

Austin Parker:
\nYes, that's true.

\n\n

Eric Anderson:
\nThanks for coming on the show. Good to meet you.

\n\n

Austin Parker:
\nThanks to having me. It's great to be here. I recently joined Honeycomb a couple months ago, but I've been an external admirer of the team over there for quite a while and I've known Charity and Christine and all of them, and they're all fantastic humans, which is one of the biggest motivating factors in moving to join them.
\nI've been doing observability stuff for five, six years now, and when it was time for me to figure out the next step, I wanted to be around people that I felt I shared a lot of values with, but also that we're really interested in being part of this bigger open source story in the observability world. And that's something that Honeycomb was really committed to. So I was very happy to come on board and see that take into its next level and really redouble our commitment to open source and OpenTelemetry.

\n\n

Eric Anderson:
\nAnd as an investor, we are very happy to have you on board, Austin. OpenTelemetry, I think is underappreciated as a technology as a community, and Honeycomb I think is embracing or making OpenTelemetry a more important part of the story with time.
\nAnd so you're kind of in the middle of some awesome stuff. How did you get there and how did you get into, I don't know if everyone knows the path into developer relations even? Was that where you planned on going?

\n\n

Austin Parker:
\nIt's quite a twisted path. In a lot of ways, my journey into developer relations is my journey into software in general, because like a lot of people, back when I was 18, 19, in the early 2000s, there was definitely this like, ah, you can just go, don't need to go to college. You can just get a job in IT, and you can work with computers very... Without having to kind of have done the whole computer science thing.
\nThat was true for the most part. I was able to get into it and IT was really interesting stuff to me just thinking about how these really complex systems worked and the actual computers that made them up. One of my earliest jobs when I was 15 years old was as the webmaster for my hometown newspaper, and the web server was a Mac Quadra just sitting under a desk in the second floor of the newsroom.
\nWhat I would do every day after school is I would come in, I would fire up a copy of BBEdit, I would take the stories that have been put into QuarkXPress and copy the text, put it into HTML files and then just move them over to the server. Just open a shared folder and just drop them in, and there you go. The new paper was done, and it was really, really wild to think about I am doing this...
\nI am giving this information to people, and how do they all get into that machine? How do they all get into that server sitting under there? And this was very early days of the web, mid-90s, and over time that really stuck with me is like, how does this stuff all work? And it drove this interest of systems thinking that inspired me to keep diving into it.
\nThen later in life, I decided I wanted to go back to school, get the computer science stuff done and move into software ,because IT is great, tons of respect to people that do that as a career, but I wanted a little bit more. And so I went, did all that, got computer science, informatics, the whole nine yards.
\nAnd coming into college as a 26-year-old is very different than trying to do it at 18, the way you see things and the way you deal with all those systems is very different. But because of that, I was able to get into a little software startup that at the time was, it's a company that is no longer with us called Aprenda, but they very early, very early on embraced Kubernetes, and this is back zero point whatever for Kubernetes, super early days.
\nThe whole idea of cloud native was being defined around us. As I got into working with these very large complex on-demand platforms, these cloud platforms, it's like you can see those systems again and the complexity of those systems and how to actually understand them. And I got really tired, honestly, of sitting there trying to figure out what was breaking by Grepping through a bunch of logs and looking at all this disconnected telemetry data that we had.
\nThat led me into a company called Lightstep. And Lightstep was a supporter of a project called OpenTracing, and one thing led to another and that's how we got to OpenTelemetry. But while I was at Lightstep, that's where I shifted from engineering into developer relations. And a lot of that was necessity, but a lot of it was also just interest. I've had a lot of time to work and explore options outside of tech and outside of IT.
\nI've been an adjunct professor at a college before. I've done community theater, 40 billion other things. The idea of teaching people about this technology and helping them understand it and being able to be a communicator was really interesting to me. So, the idea of like, oh, there's a job where you can do that, cool, let me go do that, came very naturally.

\n\n

Eric Anderson:
\nAnd sometimes that's the role you have to adopt when you are the creator or early maintainer of an open source project. There's a certain level of evangelism that is perhaps thrust on you. The community kind of is like, how does this work? What do I do? And suddenly you're maintaining a community.

\n\n

Austin Parker:
\nI mean, nobody thinks about the website. If you're out there building the next great open source, whatever, it's really easy to get caught up on like, oh, I just need the tech, because there's a lot of that. There's a lot you have to go through, and it's not a quick process and it's not a clean process to actually build the technology.
\nBut as you get users and as this community grows around you, and if you have a good idea, you will grow a community. The difference between I think between successful projects and unsuccessful ones is how well they nurture that community. Those are the people that are going to take it and run with it.
\nThere's only so much you can do as a maintainer. There's only so much you can do as a founding member of a project versus what all the people that are going to come in with their own ideas and their own projects and their own success criteria too.
\nSo how do you both nurture that and how do you make sure that people that are coming in with their own ideas don't feel like they're just running into brick walls over and over trying to do something different than you want to? How do you protect your time as a maintainer and how do you protect the project roadmap?
\nThere's all these questions that there's no manual for. There really isn't even a formal mentorship program for. We don't really think about, we don't teach our open source maintainers this balancing act. So either you're fortunate enough to have that sort of evangelism bone in your body or you find those people and you let them run with it or you don't.
\nI do think this is one of the advantages of foundations though, like the CNCF, because they can help provide some of that community muscle and some of that organizing and they have people to do that part for you if you're not good at it.

\n\n

Eric Anderson:
\nSo in your case, the projects at hand here weren't necessarily OpenTelemetry at first, but OpenTracing, OpenCensus, although I think you were on the tracing side. The merger of these two projects is unique to me. Maybe this is common in opensource land and we don't see it, but how did that come together? And I guess we probably need to address where OpenTracing came from.

\n\n

Austin Parker:
\nI will point out generally, actually, this doesn't happen. There's a lot. The odds were against us. So for context, OpenTracing was a CNCF project that was run by a coalition of, well, at the time was this very nascent idea of distributed tracing as a observability concept. People were familiar with logs, people were familiar with metrics, but distributed tracing didn't have quite the cachet that it has today.
\nI think almost 10 years ago now, when this was coming up, it had been used at Google, it had been used at your Metas and your very large scale enterprises, software-based enterprises, for quite a while. It solves this really important problem of how do I understand a request path through a distributed system. And people have various forms of this using correlation, log correlation and whatnot, but distributed tracing is, "Hey, let's apply a model to it. Let's build an ecosystem around it."
\nSo OpenTracing was supported by Lightstep. They are now part of ServiceNow, but at the time was an independent company. Some engineers from Uber who worked on a project that still exists called Yeager, which is a trace visualizer, some Zipkin maintainers. Twitter pre acquisition was a huge user of distributed tracing. A lot of the people internally there built out a lot of the stuff that we still use today conceptually.
\nSo there were all these people and they were working on defining a open standard for just a tracing API with the idea of, hey, if we have a standardized API for this, then everyone can rely on that. And then different observability vendors can implement that API for their particular tool. Concomitantly with some of this, teams at Google and Microsoft were working on something called OpenCensus, which had very similar aims but slightly different implementations details.
\nSo OpenCensus included not just an API but a SDK to actually let you use all this. It had more affordances for a larger ecosystem of tools around it. Now, in the public eye, these were very similar projects. They both accomplished mostly the same thing.
\nThat was kind of silly. We would have open source developers or library maintainers come up and be like, "Well, I have users that want to add... They want tracing in my library, but I don't know what I should do. I don't know, should I use OpenTracing? Should I use OpenCensus? Because you two are not compatible with each other and there doesn't seem to be a real alignment in the community around which of these things is going to win."
\nAnd what we saw happening was people would basically say, "I don't want to make a decision. I'm just going to wait and one of these things is going to go away and the other is going to succeed," and that makes my decision for me. And there was a time, and this was around the end of 2018, October-ish, I want to say, where I got into a Twitter argument with some people about this specific topic of OpenTracing versus OpenCensus.
\nSome of us saying, "This is silly, why are we arguing about this?" Because we all want the same thing. I took that back to the other OpenTracing maintainers and to other kind of people in the community and we reached out to the CNCF and said, "Hey, can we get someone to... Let's figure this out. Let's get a mediator in here. Let's get a small group together to talk about this, figure out the feasibility, and if it is feasible, let's merge these things." Because it is wild that there's these two projects that both are basically doing the same thing and it's harming the overall observability community by there not being a single answer.

\n\n

Eric Anderson:
\nLet me get this straight, Austin, the creators probably had some vested interests in seeing their project continue and maybe be the winner. And so maybe both projects were inclined to not fight it out, but continue in hopes that they could end up on top.

\n\n

Austin Parker:
\nIt's very easy to overlook structural versus individual reward semantics when we talk about open source. So at the time, the good thing is that I would say neither of these projects were actually stunningly successful. They were both standing on their own two legs.
\nBut in the case of OpenTracing, there were decisions that had been made early on in the project that weren't really panning out. There were things that we knew. It's like if we could have done it all over again, what will we do differently? That's the point we were at. And I think on this OpenCensus side, they were seeing this similar, we're spending all this effort trying to go against this other thing that's mostly duplicative of what we're doing and we're wasting time on this.
\nSo there were a lot of systemic barriers I would say, to the idea of like, oh, we have all this invested already, why change? But I don't want to dismiss. In this specific case, it was really gown do like, "Hey, the right people were in the room."

\n\n

Eric Anderson:
\nThis idea of power brokers coming together and it's just awesome.

\n\n

Austin Parker:
\nI wouldn't even say power brokers. I mean the way I joked about it, there's maybe 50 to a hundred people in the world that really, really care about this stuff. And it just so happens that enough of them were part of these small groups that we were able to make progress.
\nBut in a lot of cases, it was like the in question, were able to see past the systemic incentives of the competition and do what was right for the community writ large, do what was right for the observability world, writ large. And I think looking back, because in 2023, we hit five years since the initial conversations, and I've spent a lot of time thinking back about it.
\nIt's been more successful than I could have ever imagined initially when this originally happened. The idea that we've gone from these two fairly niche things, duking it out metaphorically with each other to becoming the second-fastest growing project in the CNCF. We're almost as big as Kubernetes when you look in terms of contributions and pull request activities and commits and all this other stuff.
\nThat's massive over a very short amount of time. And I credit that to yes, those initial group of people, most of which those core contributors are still here with us and they're still involved in the project, but also to this really amazing community that we've been fortunate enough to build and how they have come along with us on this journey.

\n\n

Eric Anderson:
\nSo let's now talk about what has OpenTelemetry now become, and it plays a unique role actually in the ecosystem, an increasingly important one seems.

\n\n

Austin Parker:
\nI would agree. The idea behind OpenTelemetry is pretty fun, is actually fairly straightforward to explain. It is the idea that if you're running any kind of software system, any kind of computer system, cloud native, whatever, you need to understand what's going on in that system in order to fix bugs, in order to understand how it's performing in production, in order to make improvements, do whatever.
\nYou need data about it, you need telemetry data. And in order to really, as our systems have gotten more complex, the needs of that telemetry, that what we want that telemetry to do has also gotten more complex. And to really support modern and next generation cloud native workloads, we need a new way of thinking about that telemetry data. We need standards for it. We need standard ways to not only talk about it using our words, but standard ways to communicate it from cloud provider to cloud provider or from software to observability system.
\nWe need common nouns and verbs. We need common metadata on our telemetry so that any given HTTP server or any given cloud platform is going to speak the same language in terms of what is a host name, what is an IP address? And that is what OpenTelemetry does. It is effectively an open standards project for creating telemetry data for cloud native systems.
\nThe goal of OpenTelemetry is to make that telemetry a built-in feature of cloud native systems. So our vision of the future is that you'll install your express HTTP server or your React framework or whatever it is you're using, and you'll just get this really rich stream of telemetry data, metrics, logs, traces, whatever, that you can then transform and send to any backend, any front end that you want to visualize and understand that data to turn that telemetry into observability.

\n\n

Eric Anderson:
\nNow, I think about this as an investor in terms of business terms, for the longest time, this is a big market observability or metrics or monitoring, Splunk, Datadog, New Relic are big companies. If I understand historically, part of the technology and business strategy has been that you publish your own SDKs and other collecting code that gets embedded in people's applications.
\nSo switching off of New Relic or something requires changing this code, and OpenTelemetry maybe provides a world in which you can add OpenTelemetry instrument in your application and then use any kind of backend service.

\n\n

Austin Parker:
\nYeah, that's certainly the objective. And in a lot of cases, that's where we're at today. If you're running a Java or .Net application, there are drop-in agents and libraries that you can use to get your critical application telemetry data out of there and send it to over 40 or 50 different observability backends including Honeycomb.
\nWhat I think is really cool about this and something that you're starting to see more of is this is really increasing the amount of innovation in the observability sphere because traditionally if you wanted to make an observability tool, then you had to overcome that hurdle of how do I get the data? How do I get the data?
\nThat's one of the reasons that your Datadogs and Splunks and New Relics and all of these have been so effective at keeping their marketing growing is that they have all these integrations. But OpenTelemetry says, "Well, what if that's no longer a, 'You must be this tall to ride kind of barrier anymore'?"
\nSo now we're starting to see very much shoots and leaves. We're not seeing a ton of stuff yet, but we're seeing a lot of new entrances into the market that are exploring really radically new ways of thinking about these problems. A lot of them actually end up looking like Honeycomb. The idea is that we've been building for years around how you should think about observability and what your tools should look like.
\nWe're starting to see echoes of that in newer entrants, which is really interesting. I think it's a really great validation, honestly, about our strategy and about how we approach this problem, and I think it provides a very disruptive moment for existing players in this, because historically they've worked on this older model, where you install their agent.
\nYou're locked into their ecosystem and that's how they bill you and that's how all this stuff ends up working out, what happens when everyone is on open to telemetry and you can't just have people install your agent anymore? It's an interesting question.

\n\n

Eric Anderson:
\nSo I think historically people would choose their backend service and then they would apply the necessary agents and things. I'm curious about the behavior of large enterprises, are they now just adding OpenTelemetry and then deciding later which kind of monitoring services they work with?

\n\n

Austin Parker:
\nThat's a good question. I've had two interesting conversations about this pretty recently. One is from a very smart, very small startup, they're launching a SaaS application. And they were trying to figure out what do we do for observability?
\nThe conclusion they came to was, we are not using any of these existing players because of expense and cost and we don't want to be locked into that. What they decided was if we build around OpenTelemetry, then we preserve our optionality going forward. And for now, we can use what we get for free. Not for free, but what we get with our cloud provider.
\nBecause what you're seeing, our cloud providers are also standardizing on this, like Azure, Google Cloud and AM AWS are all adopting OpenTelemetry. So if you're building on those clouds with OpenTelemetry, your stuff is now compatible with their stuff. And as you grow, you have the option to be like, I need to graduate from this into something bigger, something better, and I can do that. That's what this small team, that was their conclusion right after looking at pretty exhaustively at both current large, mid and smaller entrance into the observability sphere.
\nThe other conversation I had was with the head of platform engineering at a very, very large financial services company. And the way they're thinking about OpenTelemetry is really the same way that they were thinking about Kubernetes several years ago, where they know what they have today and they know this massive, massive cost of monitoring and understanding their existing systems. But they also know that they are still going through a cloud transformation. They'll be doing it for a while.
\nAnd for that cloud transformation, for that new cloud platform they're building, Kubernetes is the center of it. They might not be running pure Kubernetes everywhere. They might be running some mixture of OpenShift and Elastic Kubernetes and Google Cloud and dah, dah, dah, dah. There's a lot of ways you can run Kubernetes, but the Kubernetes API, like the idea of Kubernetes, sits the center of this platform.
\nIn much the same way that Kubernetes is the center of the orchestration and the way you're running applications, OpenTelemetry is the way you're understanding those applications, monitoring and monitoring those clusters, all of the different things, plug into OpenTelemetry is the center of the universe. And then you have, again, options.
\nA lot of them are saying like, look, it's not, it's cost effective for us to roll our own observability stack. We have economies of scale that you don't, or we want to do a hybrid approach, where we have our data lake, our data warehouse over here, and then we have some stuff that goes into specific tools because it's important.
\nA really interesting example of this is Slack. I think Intuit also does some stuff like this, but they build through this system where they have all of their observability data in manage data sources that they manage, but then they can tail certain parts off of it to other platforms, other observability tools based on how important it is. OpenTelemetry is the center of that too, because not just creating the data, it's collecting it, doing data pipelines, so on and so forth.

\n\n

Eric Anderson:
\nSo how does OpenTelemetry work? If we go back to the old model where my vendor provided me a bunch of agents and SDKs, is there a risk that OpenTelemetry doesn't address all the ways of collecting?

\n\n

Austin Parker:
\nSo at a really fundamental level, OpenTelemetry provides this full tool chain for creating, exporting, collecting, and transforming telemetry data. So that includes an API that can be bundled as part of a library or a framework, an SDK that lets you create and export that telemetry, and a tool called the Collector, that's a Swiss army knife. It lets you get in telemetry from OpenTelemetry SDKs.
\nIt also lets you scrape logs that are in files or listen for stats D or Prometheus metrics, and then it transforms them and sends them somewhere else. So within that, you have this huge range of options. It's designed to work with what you have today. If you're using some combination of stats C metrics and logs, cool. You can drop the Open Symmetry collector in, set up some rules and now send that out to wherever.
\nIf you are using a fully proprietary stack, there are some agents in some ways to receive that data and transform it to OpenTelemetry, and it's all open source so you can go in and write your own too. We've seen a lot of that people coming in and saying like, "Oh, well I have data in this format," and if there isn't already a receiver, they'll come in and contribute that back so that everyone can make use of it.
\nThis is one of those things, where it's a rising tide that lifts all boats. Everyone benefits in the ecosystem from having more ways to get data in and translate it, because it makes that data more useful and it saves you as a developer, if you're a developer or you're an SRE, you don't have to think about, we can't change observability tools. Ours is too expensive. We're stuck with this forever.
\nIt's like, well, OpenTelemetry makes that migration really painless because you can just use the collector, take the stuff you already have and shove it to the new place you want it to go. Also, really good for people that are, what is much more common, especially places that have been around a little longer, is you've got all these legacy services.
\nSome of them are going to be emitting data in all sorts of different formats. You can go and you can write that yourself and plug it in really easily to translate that into a modern format and align it with your other new work so that you're getting the old stuff and the new stuff hand in hand and this braid of telemetry data rather than having a bunch of disconnected tools.
\nIt's one of the things that you see a lot, I think, in the monitoring world now is developers that are trying to... You're on call. Something breaks. It's like, I got to go to this over here to see the alert, and that gives me a little bit of context, and now I got to go over here and try to find that in two or three or four or five other places. And maybe I don't even have access to the place that emitted from that the original problem started at.
\nSo now, I need to page someone else or get them on Slack or Teams or whatever and have them go look in their two or three or four or five systems for the data. With OpenTelemetry, we're giving you all this as an interconnected interrelated stream, so your traces, your metrics, your logs, they're all connected to each other. They're all correlated with each other through the observability context that we provide.
\nIf you put that into tools that can really make use of it, tools like Honeycomb, then it's a very transformative way of thinking about what does observability mean to me? How am I actually using this to debug incidents? How am I using this to do performance profiling?
\nIt's an interesting part because it's like having the telemetry is great, but telemetry does not equal observability by itself, and just collecting a bunch of data doesn't give you any value. I think that's cool about Honeycomb, why I like being there is that I think we really do have a great way of looking at that data and helping you get value out of it.
\nBut I also think that it really helps push the industry forward and helps say, "Hey, it's not enough just to have everyone locked in and your agent or whatever to get the data. You actually have to do something with the data, and that's what's going to advance this field." More so than just focusing on collection or storage.

\n\n

Eric Anderson:
\nYou describe a narrative, which is interesting, that you could have this startup, for example, whose motivation is more like avoiding lock-in and maybe managing costs, adopting OpenTelemetry, instrumenting everywhere, and then if they choose something like Honeycomb, they end up in this world where actually we're doing observability better than we've ever done it before.
\nI went into this journey with avoiding lock-in and lowering costs, and I ended up doing super advanced observability and getting a bunch of visibility in my systems, which is saving me time and money in different ways.

\n\n

Austin Parker:
\nYeah, it's about prioritization. When you're at that new startup level with 10 people, you want to optimize for different things than when you get to 50 or a hundred or 200. That's the whole startup journey is just figuring out the right time to make different trade-offs, and I think OpenTelemetry gives you that flexibility regardless of your organization size, because even very large companies are going to go through those trade-offs.
\nWe've seen this over the past couple years with the changes in macroeconomic conditions. A lot of people are doing a lot of belt tightening, and they're looking at every contract and they're looking at every penny of observability and monitoring spend and asking, "What are we getting out of this?"
\nI want people to be able to not have to look at it and say, "Well, if we lose this spend over here, then we literally don't know what's going on anymore because reliant on some proprietary agent." I think that's a bad corner to get backed into, and move to a point where they can say, "Oh, well, we can be intentional about our choices in how we spend on observability and monitoring. We can be intentional about our investments in this.
\n"Can we do more custom telemetry? Can we do more in terms of sampling? Can we do more in terms of making really reactive responsive systems?" But you have to have that telemetry data and it has to be open and it has to be vendor-agnostic, and that is when you get down to it. That's why OpenTelemetry is inevitable. At the heart of all of this is OpenTelemetry is solving a really big problem for a lot of people, which is why it's been as successful as it's been.

\n\n

Eric Anderson:
\nSo this successful project you mentioned, it has as many, I don't know if it was contributions or whatever metric it was relative to Kubernetes. Where do OTel people hang out? So it is part of the CNCF, which some people highly associate with Kubernetes, and so if you go to KubeCon, there's like an OTel Day.

\n\n

Austin Parker:
\nSo we have a pretty broad community. I actually looked the other day and something like over the past five years, like three or 4,000 contribution, like different unique companies contributing, spanning everything from very small to Fortune 50. So not everyone is contributing an equal amount, obviously.
\nThere's a core group of 10, 20, mostly companies that are involved in the space, your Splunks, your Honeycombs, your Lightsteps, that are doing a lot of the work, but everyone tends to come together online. We have weekly meetings for all of our SIGs. We have a lot of stuff gets done on GitHub, like most good open source projects.
\nWe have Slack channels in the Cloud Native Computing Foundation Slack, and then twice a year at KubeCon in Europe and in North America, we try to have a lot of OpenTelemetry content at those. One thing we started doing at the end of last year in KubeCon in Chicago is we have this new thing called the OpenTelemetry Observatory, which is just a bigger sort of project booth. It's sponsored by Splunk, and it's a place to have happy people come together, whiteboard out stuff.
\nThat's a real fun way for the community to see each other in person and get together and talk. We also have Observability Day at KubeCon. Actually, we're going to have another one here in mid-March in Paris, which is going to be actually been so successful. We've had to expand it to two tracks, so that'd be the first time that's happened.
\nThere's just so many people that want to come and share their OpenTelemetry stories and their observability stories in the cloud native world that we had to get, well, it was more seats or more talks, and we went with more talks. So hopefully that'll go well, and we should also be having some specific community days this year as well. Hopefully, over the summer. The best way to find out about all this is to keep an eye on our website OpenTelemetry.io.

\n\n

Eric Anderson:
\nWell, I think you're in an envious position of you get paid to work on open source and a really cool project with an awesome team.

\n\n

Austin Parker:
\nI'd like to think so.

\n\n

Eric Anderson:
\nThank you for joining us today.

\n\n

Austin Parker:
\nThanks for having me. It was great to come on, a really fun conversation.

\n\n

Eric Anderson:
\nYou can subscribe to the podcast and check out our community Slack and Newsletter at Contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.

","summary":"","date_published":"2024-02-28T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/ec9b17e2-e507-4096-94a4-5e806f6c90f2.mp3","mime_type":"audio/mpeg","size_in_bytes":33684210,"duration_in_seconds":2101}]},{"id":"2c0ac6c4-2825-4646-8c83-cb84b6367c61","title":"Never Build Permissions Again: OPAL with Or Weis","url":"https://www.contributor.fyi/opal","content_text":"OPAL is an open-source administration layer for Policy Engines such as Open Policy Agent (OPA). OPAL provides the necessary infrastructure to load policy and data into multiple policy engines, ensuring they have the information they need to make decisions. Today, we’re talking to Or Weis (@OrWeis), co-creator of OPAL and co-founder of Permit, the end-to-end authorization platform that envisions a world where developers never have to build permissions again. \n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n History of Permit and OPAL\n\n The benefits of an open-foundation model rather than open-core\n\n RBAC vs ABAC vs ReBAC\n\n Why developers would prefer to not have to deal with authorization\n\n Or’s own podcast, Command+Shift+Left\n\n\n\nLinks:\n\n\n OPAL\n\n Permit\n\n Command+Shift+Left\n\n Terraform\n\n\n\nPeople mentioned:\n\n\n Asaf Cohen (@asafchn)\n\n Filip Grebowski (@developerfilip)\n\n\n\nOther episodes:\n\n\n Open Policy Agent with Torin Sandall\n\n Community Driven IaC: OpenTofu with Kuba Martin\n\n\n\nOr Weis:\nIf you build something that is valuable and you communicate that value in a clear way where people are searching for that value, it'll resonate with them and they'll interact with you.\n\nEric Anderson:\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. I am excited to share that we have Or Weis on the show today who is one of the co-founders of Permit and OPAL. Permit being a company and OPAL being an open source project. Or, thanks for coming on the show.\n\nOr Weis:\nThanks for having me, Eric. I'm super excited to be here.\n\nEric Anderson:\nYou may or may not know at this point that by the time the show is released, we'll have made an investment in Permit, which I'm very excited about. Known Or for a long time, but this is the first time he's coming on Contributor. We will talk less about Permit and more about OPAL today. What is OPAL?\n\nOr Weis:\nOPAL stands for Open Policy Administration Layer and it's our open source project that has become the de facto way to manage policy engines at scale. So policy engines like Open Policy Agent, OPA or CDER from AWS, these are de facto way of managing and running policy as code, but it's not enough to have one engine. You need to run multiple ones as part of your software, and you need to scale them and load them with the right policy and data that they need in the right time. And that's actually a hard problem to solve that by yourself. So instead of solving it, you can use OPAL. And OPAL is inspired by how Netflix solved the problem. So they created a replicator pattern that replicates the policy and data into each of those policy engines. And that's what we've took as inspiration to build into OPAL itself.\n\nEric Anderson:\nSo we might be familiar with these policy engines. I think we've had OPA on the show a long time ago, the open policy agent folks. So you bring or OPAL brings other services that surround a policy engine like OPA. If you want a policy engine, you probably want these other services too.\n\nOr Weis:\nYeah. It's just the fundamentals. The policy engine by itself is useless unless it has the policy it needs and the data needs, the world picture of what's going on. So the list of users, the roles assigned to each of them, the quotas, geolocations, all these things you want to be using as part of your policy, you need to load them into the engine. You also need to keep them up to date as the data plan is changing. Like if you are using geolocation for example, your customers might be moving around. So you need to dynamically on-the-fly, update the data, and it can come from various data sources. And OPAL enables you to, through a lightweight Pub Sub channel, have each of your policy engines subscribe to updates for both policy and data. So for example, if you are maintaining a database or service for geolocation, you can track those changes and have them propagate into OPAL and through it into all of the policy engines that need that data.\n\nEric Anderson:\nSuper. So you're a policy delivery service for engines. I'm making up taglines that you may or may not espouse as we go here. Or, how did you get into this? What leads one to want to build such a thing? And tell us about yourself along the way.\n\nOr Weis:\nOkay, so maybe I'll take a quick step back and start at the beginning. So I'm an engineer and background. I started writing code at the age of five, but my career actually took off in the intelligence core in the IDF where I was an officer, developer, team lead, engineer, reverse engineer, yada yada, yada. Essentially a cliche of an Israeli entrepreneur. After my service, I worked in a startup called Intiguo where we built container technology before it was a thing, but with a truly horrible go-to-market, even worse than Dockers after they ruined the wrong go-to market. And then I co-founded a startup called Reactful that was acquired by Metadata Inc. I was a VP of R&D in a cybersecurity company catering to governments and like-minded agencies. Only did defensive projects and offensive ones. Super proud of it, especially in retrospect. And then between late 2016 and up until three years ago, a little over, I co-founded and ran CEO a company called Rookout, an effort dev tool company in the production debugging space that was later acquired by Dynatrace.\nAnd during my time working on Rookout, I ended up rebuilding the access control to our product five times when the company wasn't even three years old. That basically drove me nuts. Reflecting on it, I realized that I've been building this crap, pardon my French, for thousands of times throughout my career, and at no point did I want to. I got together with a good friend of mine and now my co-founder and CTO, Asaf. He at the time worked at Facebook now Meta and he worked on their internal developer tools and internal authorization and he saw that they have invested a team of 30 people for half a decade to build a level of access control that they have and they're still building on. So we quickly realized this is a huge problem now and it's only going to get worse as technology continues to scale out, become more distributed, more complex, and also have more smart components as part of it.\nSo we realized we want to solve this problem once and for all, and we want that you and other developers will never have to build permissions again. And that's why we decided to create the company that is now known as Permit. And the first thing that we wanted to do was to adopt the best practices. So we started with adopting OPA open policy agent and we wanted to have it run at scale for our customers. And that's where we ran into the first problem. How do we manage this thing at scale? It doesn't provide anything for it. You just get one engine and if you run it by itself, it's okay, but if you have hundreds, thousands, tens of thousands of instances of it really becomes quite a labor, some task to manage it on your own. And so we looked around and as I mentioned, we saw how Netflix used OPA.\nThey have a great talk as part of the CNCF on YouTube where they describe how they created an engine that replicates the needed data and policy into each of the OPAL instances. So we decided, okay, this is a good approach to go about this, but Netflix didn't open source their project, so we took it upon ourselves to open source it, then we called it OPAL. So it's basically necessity all the way down. Necessity from building companies, necessity into needing to build access control and necessity into, okay, we want to solve access control at scale and we want to adopt the best practices, but we also want to work in a way that is applicable for everyone.\n\nEric Anderson:\nAnd you mentioned earlier that part of that was adopting. On your journey, you interacted with OPA and OPAL sounds like OPA and so I see some resemblance there.\n\nOr Weis:\nThat was intentional though it's important to note that the A in OPA and A in OPAL are not the same one. One is agent, the ones in administration. So today OPA supports multiple policy engines, but at a time it was focused almost exclusively on OPA. And we knew from the start that we'll support multiple engines and that's why we built it to be extensible and dynamic enough. But starting with OPA, we felt like it's a win because it both resonated with the current community. It sounded nice, it described what it is and it sounded like a cool gem that is fun to have.\n\nEric Anderson:\nYes, it's very desirable gem. And what's maybe impressive though that we haven't acknowledged is that OPAL has done quite well by some standard. Big community, large companies are depending on OPAL. How did that come about or what was it like seeing that happen?\n\nOr Weis:\nSo yeah, OPAL is now the de facto way to run policy engines at scale. Tens of thousands of companies with names that are front to name-drop like Tesla, Zapier, Accenture, Microsoft, Cisco, Walmart, the NBA, a bunch of banks, a bunch of healthcare institutions, over 10 million docker pools and growing thousands of stars on GitHub and the thousands of engineers in our community, continuing to grow strong. We were very proud of OPAL and the journey that it took us on. We didn't really know or expect that it will grow to be this popular. We definitely hope, but we didn't know. We just felt it was the right way to have this component be open source.\nWe felt that it's a critical element that is missing as part of the ecosystem. We felt like having policy engines is important, but also being able to manage them is important and that needs to be open source. And we also felt that it is complimentary and not cannibalizing into what permit needs to be. I'm a huge fan of open source in general, but I'm not such a big fan of open core. So being able to go with a different angle here, which I like to call open foundation, where the commercial offering and the open source offering are not providing the same value proposition, but instead are foundations for one another.\n\nEric Anderson:\nSo permit and OPAL complement each other. They're different offerings, but the people who want OPAL probably also want Permit. And so building the open source project lends itself to helping you build a community for prospective customers.\n\nOr Weis:\nExactly. Though, depending on where you're on the stack, there's a fork in the road. So first of all, with authorization, you need it across the stack. You need physical layer authorization like locks on your doors and on your windows. And then you need network level authorization. So firewalls, zero trust networks, VPNs, then you move into infrastructure level authorization. So service to service access control and admission control and Kubernetes. And this is where OPA originally grew to fame as part of the CNCF and mainly gatekeeper and admission control to Kubernetes. And then you graduate in application level access control, which is basically everything else which I like to summarize is which users can interact with which other users through which features.\nAnd so if you're doing application level access control, you'll discover a lot of our customers find us this way. So they discover policy as code, they discover either Cedar or OPA, they discover OPAL and then they discover Permit and they often say, \"Oh, this actually reps everything I want in a nice bow. I can go into that.\" But we also have a lot of people in the community doing infrastructure level access control for example. And for them, Permit is while somewhat irrelevant solution, it's less of a relevant solution. So they might just end up working with OPA and OPAL vanilla without the additional sprinkles from Permit.\n\nEric Anderson:\nAnd maybe going back to this idea that you managed to get a bunch of awesome companies using OPAL, is there a trick to that adoption? A lot of us want to believe you just put it on GitHub and the people come.\n\nOr Weis:\nNo, it's not that simple, but there's a truth in the essence of it. If you build something that is valuable and you communicate that value in a clear way where people are searching for that value, it'll resonate with them and they'll interact with you. But it's not enough just to build it. You need to build it with the right supporting materials. It requires documentation, requires messaging, it requires marketing, it requires getting up on stage yourself and spreading the word. It requires interacting with the right people and it requires being embracing and open.\nSo when these people come in and they want to learn about the product, you need to have the patience to sit down with them and explain to them how this open source component works. And while Permit is very mature and has all the bells and whistles, OPAL being an open source project isn't as polished naturally. So having the community, having the passion, attention and willingness to help people take this a step forward and starting that feedback loop that will get more people to then help each other is a step that you have to take. It doesn't just materialize day one, you first of all need to be the community before the community emerges as a whole.\n\nEric Anderson:\nYou build something that has a lot of value, but then you have to communicate this value that seemed to be an important point that you were outlining. And you described various ways that you communicate value in a slack conversation or on stage or through content. Are there taglines? Are there messages that really resonate with people about authorization?\n\nOr Weis:\nI think the best one that we have people jive with is never build permissions again. In the end of the day, I think it's important to know that I think most of the players in the space don't necessarily get it end to end. Authorization is not something that developers want. It's not something sexy. It's not, \"Oh, I'm excited to build this authorization layer.\" No, it's a nuisance. It's something that is keeping you from building the core features of your product.\nAnd while yes, developers like building technology and using advanced tools and extrapolating on that and building amazing things, they want to build the amazing things that they care about. And this is not something that is unique to any company or any product. So I think we've never built permissions again, we say something very simple that speaks to the heart like, \"Yeah, we know you don't want to do this. And we ourselves as engineers, we don't want to do this either, but we want to do this once right for everyone so you can focus on the things you actually care about.\"\n\nEric Anderson:\nOr, take me into the product or the architecture some. So most people I think are familiar with very basic permissions. I have a user's table, maybe I already use some third party service in order to authenticate my users and maybe I have one or two roles, they're a user or maybe they can have an admin and perhaps that's just another, in my user's table, that's just another attribute. Are they an admin or not? And I feel like this is the very basic permissions that people start with. How do you build it in a way that you never build it again and how do you build it differently?\n\nOr Weis:\nSo it's about adopting right best practices. And what we try to do with permit is bake those best practices in. So you won't have to think about it, but still let's unpack them. Think one of the most important best practices is decoupling policy and code. The most common mistake people do is assume that the data models and code models that they have for their core application will be the same for the authorization layer, but they aren't, these are two very different tasks that will require different code flows and different data. And if you just merge them together, it's just part of the application code, you're going to have a bad time because as the requirements for both change, you'll have to change both constantly. So you're constantly generating more work and more friction for yourself, and every little change becomes instead of a quick update, becomes a multiple months project.\nSo I think decoupling policy and coder is the most important thing. Once you've decoupled your policy from your code, the question is now how do you manage it? What is the best way to manage policy and have it run in a good way, a performant way as part of your application? The best answer humanity is come with is policy as code. So in general, through infrastructure as code and other patterns like that, we recognize that complex things that we want to communicate and work on as a team. The best way to represent it and maintain it and work on it is as code. So we want to have a language or format that is dedicated to run as a policy. We want to manage it for the best practices of code, meaning you are able to run tests on it, you are able to do benchmarks, you're able to replicate scenarios and you're able to do code reviews, et cetera, et cetera.\nSo now we decoupled our policy from our code. We have code that represents our policy. We maintain Git we have versions and tests. Now we want to run it. So ideally in modern infrastructure, we want to run it as a microservice. So you want to have a microservice for authorization that lives alongside your software and is performant and has low latency. So you don't want to have it remote from your software because for every little query you have to go to that remote site and that will add a lot of latency. That's just the rules of physics. So you want to be able to run that microservice next to your software. That's where, by the way, where OPAL comes in. So now we have this microservice, and so how do I make sure that this microservice remains up to date? Because it needs to be independent in the field. It needs to be able to answer those queries as they come in and can't be part of a huge separate database.\nSo how do I load into it at the edge? How do I load the policy and data? And so that's where OPAL comes in with the second-best practice. Be event driven, which is a common best practice for cloud native applications and microservice applications specifically. So now we're event driven. We have decoupled policy from code and we have a dedicated microservice for authorization. Lastly, we need the right interfaces. We need to recognize that this is not just for developers. If you're building policies, if you're maintaining a product, at the end of the day, access control is really about connecting people and systems to what you've built. And so our organization building a product or a system, everyone's involved. So it's the developers, obviously it's DevOps obviously, but it's also the product managers, security, compliance, sales, professional services, support, everyone.\nSo if for every little change you want to make, you have to go back to the developers that will write code and go through a software development lifecycle. Everyone's having a bad time, especially the developers which you force into becoming a bottleneck for this annoying thing that they don't want to do. So it's about creating the right interfaces. We need to enable everyone around the table to be able to participate in the policy authoring and maintaining process. Each of them needs the right interface for that.\nSo if you're building this on your own, you can think, \"Okay, what do my product managers will need here? Maybe they just need a way to assign roles. Maybe they just need a way to set attributes. Maybe they need a way to maybe affect the policy a little more.\" What we've done with Permit is we provide you with a policy editor that I like to say a monkey can use or even a product manager if they're smart enough. By the way, ironically product managers are the ones that love that joke the most. And what it does, it generates policies code for you. So a product manager can work with it and generate the policy and sales and professional services can come in and generate the policy and it'll all get distilled into that Git repository that we described before. And the developers can still load code directly so everyone gets the right interface that is suitable for them.\n\nEric Anderson:\nIt sounds like and I'll try and use my analogy from earlier. If in my application I might be tempted to say something like, \"If the user is an admin or if the user is not an admin, display this list with this information,\" and instead I just say, \"Display this list for all those who have permission.\" And I outsource the permission system to tell me what information to display. And then the policy can be applied independent of the code, the code's not saying the logic on who gets to see what, the permissioning system is.\n\nOr Weis:\nYeah. So thank you. You also, you want to do that filtering across different services and you want to have a unified policy that affects all of them. So once you have those policy engines and you can load each of them with the right policy and data, you can keep everything in sync without having to constantly run about managing it. And it's important to note that with policy engines, it's not just about binary decisions. It's not like show this or don't show this. It's also give me, filter out a subset of the data or even do something called partial evaluation. Partial evaluation, you ask the policy engine to spit out a abstract syntax stream, basically a set of conditions that you can then compile into SQL or another query that you can pass to your underlying database. So you can filter at the database level itself, but still it'll be driven by the primary policy that you've defined together as a team of stakeholders.\n\nEric Anderson:\nMaybe moving away from the project for a bit, or you mentioned earlier you have opinions around open core versus what you called open foundation. You could tell us more about that or any other thoughts about what developers want to experience in an open source project or a software vendor selling to them.\n\nOr Weis:\nYeah, I think at this point, first of all, it's important to note that our entire world is open source, cute anecdote. So as part of the investment round we just did, we were requested to list all of our open source components, just the direct dependencies, that's tens of thousands. And if you extend it to subsequent dependencies, that's I think almost in the hundreds of thousands, unless you dive into it, we're not aware of how much we're immersed in open source. Like every little thing we interact with has open source into it. So I think first of all, it's clear open source as one. If someone was still debating this it as one and it's part of our present and definitely a key part of our future, and it's only going to become more and more dramatic. Now the question becomes on how we manage open source and how we align it with our business.\nSo open core used to be the default answer here. You create an open source project and then you upsell it, you provide the enterprise value as a commercial offering, you provide the support or professional services. This has grown amazing companies like Red Hat, MongoDB and others, but that happened in a few decades back. And as we are running towards the singularity, things are speeding up and the speed of things today as far higher than the ones that happened before. So now when you're building an open source project, by the time it matures and you can commercialize it, other companies will commercialize it before you. So for example, look at Elastic. So Elastic built an entire ecosystem with Elastic search and Kibana, et cetera. And by the time Elastic matured with the open source project and created the category of our companies, AWS, CoraLogix, Logs.io stepped in and basically commercialized a SaaS version of it faster than Elastic themselves.\nAnd then they started fighting over licensing, and then that infuriated the entire ecosystem and pushed Elastic to the side while they were the crown owners of this thing. And that forced them to basically pivot into a security company. You can see also a similar example with Docker. So Docker exploded as an open source project. It grew and grew and grew, and it pushed aside commercializing it. And when they had no more choice, they had to commercialize it, they ended up making the mistake of trying to take something back. Once you put it out, you can never put the genie back into the bottle. That's it. It's not yours anymore. But that was also apparently for Elastic. So once Elastic tried to change the license, AWS just forked it and created their own open source project for Elastic again.\n\nEric Anderson:\nI think I'll just jump in here to point out that I think Terraform is an interesting situation because prior to them, the people that were changing licenses, Elastic and others cited the cloud providers as the competitive set. Terraform I think was explicit that we're worried about other startups commercializing this, and it was those other startups who forked. We had OpenTofu on the show recently, Spacelift and others. And so I think you're right. You're not just worried about some other incumbent eventually commercializing you, but peer startup companies will latch onto your project and commercialize it as well.\n\nOr Weis:\nIt doesn't matter who or what. It's the ecosystem itself. The ecosystem itself is evolving faster than you were able to react as a single company. And ironically, the open source project is becoming an encumbering effect to you because you're wasting, well, you're investing energy in growing the open source project. You can't invest the energy in moving fast with it through the market. It's an irony but it's the fact. And yes, HashiCorp with Terraform is a terrific recent example. They realize that they are unable to commercialize it in the way that it is because of the changes in the ecosystem. And also there's a worry that they will lose control of it altogether. So a lot of people would come against this and say, \"So wait a minute, why are you saying so, we can't do open source anymore, so there isn't a way to commercialize this?\"\nAnd I'm like, \"No, hold your horses.\" You need to just think about it in advance. You need to realize that this motions are going to happen. So if you are going to be in a situation where you're cannibalizing your own market with your open source projects or you're enabling other players to eat on your own market through that, you're going to have a bad time. But if you create enough separation and you plan on symbiosis between your commercial offering and your open source offering, they can grow together and you'll never run into this problem. And one model I suggest for this is open foundations and symbiosis is one is based on top of the other. There are probably other symbiotic models that we can think of, but this one is the one that I found the most easy and natural to approach. And yes, it requires more thinking, it requires more design, but it's not that big of a deal. And I think if we start to apply more of this logic into open source projects we'll get better open source projects and we'll get better companies at the same time.\n\nEric Anderson:\nFantastic. Or, this is also an exciting time for the company, not just new funding, but you've got a bunch of things you just launched. Maybe you can tell us what's new around it Permit.\n\nOr Weis:\nYeah, so we recently launched both new offerings for Permit elements and for more policy models. So maybe a few words on policy models. Everyone's familiar with RBAC role-based access control. It's the bread and butter of the ecosystem, but people are not as familiar with ABAC and ReBAC attribute based access control and relationship-based access control. But as applications are becoming more complex and having more advanced patterns, RBAC isn't enough anymore. And so more and more companies fighting themselves needing these more advanced models. And what we try to do if Permit is that you don't have to understand these models. You don't have to actually get into all the academia and knowledge and modeling scenarios. You can just work with simplified tools that will enable you to quickly shift from RBAC to ABAC and ReBAC and back and forth. We realize that software needs to be very malleable today.\nIt's about abstracting this and allow you to move quickly. And one of the recent models, so we had RBAC and ABAC from way back then very early on, and now we added ReBAC. So the ability to build basically a graph of hierarchies that connect everything, but unlike other vendors that tell you to literally draw the entire graph and model all the different edges and nodes, and you need to be a computer scientist on how you're going to be navigating on this graph, with Permit, again, we want a monkey to be able to work with this. So with Permit, we just guide you on to connecting dots. So we [inaudible 00:27:17] proof for concept called rolled derivation or we apply the concept called world derivation. So for example, let's say we have a customer that wants to manage farms. So they have an operation with a set of farms and then they have fields for each farm.\nSo with Permit, you can just create those dots and connect them. So you say, \"I have farms, I have operations, and I have fields.\" And you say, \"A field goes in a farm and a farm goes in an operation.\" And now you can assign roles for each. So you can be, for example, an owner of an operation, an owner of a farm, an owner of a field, just stick on those labels. And then you can say you can apply role derivation. You can say, if a field is a part of a farm and you are the farm owner, you automatically get an owner of each of the fields and you can say, if you are an operation owner, you automatically become an owner of the farm. It fits part of that operation. And these cascades, so you are now from the operation, can own all the fields.\nYou're putting basic labels, connecting dots, and the graph emerges for you without you having to think about it all day one. And you can do all of it with a very simplified UI. We feel that is our responsibility is Permit to take all of these complex ideas for the performance challenges, with all the modeling challenges, with all the intricate elements of how you connect scalable data and scalable policies and just make it as easy for you so you don't have to think about it. And this is what we launched now with ReBAC support. So our ReBAC support is as simple as our ABAC support and then it's our RBAC support. So this is one thing we're very excited about.\nAnd the other one is Permit elements. So we provide interfaces not just for yourself as a stakeholder, managing the policy, but also for your end customers. So think about being a customer, being a user using an app. You're constantly doing access control, you are inviting upper users, you are assigning roles to them, you're giving them time-based access. You are impersonating them and viewing the system as if you're them and trying to view their perspective. You are accessing the system with additional privileges in emergency cases. And this list just goes on and on and on. All of these are UI interactions that you already expect to see in products, but why are we constantly rebuilding them from scratch?\nSo we decided with Permit elements to put an end to that. So when we say never build permissions, again, it's the entire package. So you get a user management scheme, get audit logs, you get approval flows, you get all of these ready to be embedded into your software, again, so you can focus on what's unique to your product. And now with approval flows for Permit elements that again, I think we are very excited about. We've constantly got customers asking us, \"Oh, I need to have this flow in my product. And it's very complex to build. There's also cryptography on the tokens that need to be sent, can you handle this for me, it's a nuisance,\" and we're excited to take it off everyone's table and make sure it runs right.\n\nEric Anderson:\nAs you spoke there Or, I wanted to revisit the permissions space a little bit more. When I talk to developers of SaaS products. Some of them are like you in your Rookout experience and they're like, \"Man, I hate permissions. This was a big pain.\" And others are like, \"Oh, we added a couple roles and haven't really thought about it again.\" Certainly most developers listening to a show like this has experienced AWS IAM. They can imagine a very complex application having a very complex permissioning system. And they're like, \"Definitely, I'm sure you have customers who feeling a ton of permission pain,\" but I also wonder that there was a time before Auth0 when everybody built their own authentication system and didn't think they needed one from a vendor. Is there a world in which everybody decides it's just normal to pull in an outsourced third party permissioning system on day one of a new app? And what's going to change in people's minds? What will they realize in order for that real world to exist?\n\nOr Weis:\nFirst of all, I'd start with saying that it's okay not to do so. If you just have a basic application now and you feel this is not a priority for you, that is great. You should focus on your priorities and what's right for you at the moment in time. But you do need to realize that as your application evolves, it's a matter of time until you face more complex authorization requirements. It's on a matter of if it's a question of when. And this question also meets companies earlier and earlier because what we expect from software today, that bar is also constantly rising. The compliance and privacy requirements that you have are constantly being pushed upwards. Just look at both the European Union's new cyber requirements and the Biden administration's requirements. Look at how more and more companies are trying to be SOC to and GDPR-compliant at day one.\nAnd all these compliance requirements are 80 to 90% access controlled mechanics that you need to make it. So either manage them manually or you bake them in into what you're building. And also the patterns in the applications themselves. The way we interact with other people, the way our features interact with data, they're constantly becoming more complex. So the bar for basic access control is constantly rising as well. So if 10 years ago you could have just done not even roles, you could have just done two roles. You have the developers or admin and everyone else is just a user, you just maybe have some access control list. That word is gone. Almost every application today will end up in RBAC within months and in a year from now, most applications will run into RBAC and ABAC and maybe even ReBAC within months and this period of time continue to become shorter and shorter and shorter, especially as AI systems proliferate the space. Just think about the speed and complexity that they interact with other software.\nSo it's really a question of time, both in terms of the growth of your software and the growth of the entire complexity of the software market. And I think I don't need to convince people in general with Permit, all of our customers, our exception, are all inbound. When I talk to customers, I rarely pitch the product. They come to me after they say they want the product, they already played with it, and they come within specific questions. So I never tell people, \"Oh, you have to use this at day one.\" Use this on the day that is right for you and just plan ahead. Think about the best practices and maybe design your software as you're building it. If you're implementing on your own, design it in a way that you don't have to reshuffle the entire deck to upgrade to the new features, and it's not that hard to do.\nAnd when you do come to us and you want to use Permit, one of the key things that we try to do is make the migration or the adoption easy enough, and we combine two key requirements into the design. Both have it gradual. So you can apply Permit on a single function, single route, single middleware, single microservice, single reverse proxy, single API gateway. And you can choose how wide of a blanket you want to start with. And the other part is be able to do this fast. So we have SDKs for every language. We have plugins for most reverse proxies and API gateways.\nSo you can move in quickly and apply it on the subsidy that you want with speed. As long as you don't hard fuse your policy too much into your code that you'll have to clean it out later. You can have a good time. And what I recommend a lot of people do is even if you're microservice for authorization, it's just a function that returns true, just take it out and have it as a separate service and it'll make your life a lot easier later. And the cost is basically negligible. So small steps now, big payouts later, and we don't need to go crazy about it. But yes, this is coming for everyone, but we can take it in our own pace.\n\nEric Anderson:\nOr anything we didn't cover today that you'd want to cover.\n\nOr Weis:\nAnother item that comes to mind is a podcast that I run with our dev advocate Philip, also known as Developer Philip on YouTube, we have a podcast called Command+Shift+Left which is one of the most fun things I get to do as part of this job. It's a developer-focused podcast and with a unique format. So it's not like your interview format, but instead you have four developer leaders coming in with developer-oriented fact. And then we just laugh and debate and foreign additional facts. And it's very lighthearted, it's very fun. And we constantly find ourselves getting into, are we living in a simulation or not? Some way it's all of our episodes derail into that. But I really enjoy having that, doing that podcast because as I said, very lighthearted, very fun. And I bring in a lot of funny people. And for me it's just hanging out with friends. And I get the sense that people also enjoy listening to the podcast and just hanging out with us as well.\n\nEric Anderson:\nSo that's Command+Shift+Left.\n\nOr Weis:\nCommand+Shift+Left and you find it on most social networks with the tag JustShiftLeft and you'll find it on command-shift-left.com with CMD-Shift-Left. And it's on Spotify, Google Podcasts, apple Podcast, et cetera.\n\nEric Anderson:\nOr thank you for coming on today. And also thank you for OPAL, you and your team. This was a gift to humanity. Maybe that's not what you plan to give to humanity when your day is done, but at least you've given that.\n\nOr Weis:\nI think it's a good start and it opens up my appetite to do even more and to find good alignment points for good open source.\n\nEric Anderson:\nYou can subscribe to the podcast and check out our community slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson, and this has been Contributor.","content_html":"

OPAL is an open-source administration layer for Policy Engines such as Open Policy Agent (OPA). OPAL provides the necessary infrastructure to load policy and data into multiple policy engines, ensuring they have the information they need to make decisions. Today, we’re talking to Or Weis (@OrWeis), co-creator of OPAL and co-founder of Permit, the end-to-end authorization platform that envisions a world where developers never have to build permissions again. 

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n\n\n

Or Weis:
\nIf you build something that is valuable and you communicate that value in a clear way where people are searching for that value, it'll resonate with them and they'll interact with you.

\n\n

Eric Anderson:
\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. I am excited to share that we have Or Weis on the show today who is one of the co-founders of Permit and OPAL. Permit being a company and OPAL being an open source project. Or, thanks for coming on the show.

\n\n

Or Weis:
\nThanks for having me, Eric. I'm super excited to be here.

\n\n

Eric Anderson:
\nYou may or may not know at this point that by the time the show is released, we'll have made an investment in Permit, which I'm very excited about. Known Or for a long time, but this is the first time he's coming on Contributor. We will talk less about Permit and more about OPAL today. What is OPAL?

\n\n

Or Weis:
\nOPAL stands for Open Policy Administration Layer and it's our open source project that has become the de facto way to manage policy engines at scale. So policy engines like Open Policy Agent, OPA or CDER from AWS, these are de facto way of managing and running policy as code, but it's not enough to have one engine. You need to run multiple ones as part of your software, and you need to scale them and load them with the right policy and data that they need in the right time. And that's actually a hard problem to solve that by yourself. So instead of solving it, you can use OPAL. And OPAL is inspired by how Netflix solved the problem. So they created a replicator pattern that replicates the policy and data into each of those policy engines. And that's what we've took as inspiration to build into OPAL itself.

\n\n

Eric Anderson:
\nSo we might be familiar with these policy engines. I think we've had OPA on the show a long time ago, the open policy agent folks. So you bring or OPAL brings other services that surround a policy engine like OPA. If you want a policy engine, you probably want these other services too.

\n\n

Or Weis:
\nYeah. It's just the fundamentals. The policy engine by itself is useless unless it has the policy it needs and the data needs, the world picture of what's going on. So the list of users, the roles assigned to each of them, the quotas, geolocations, all these things you want to be using as part of your policy, you need to load them into the engine. You also need to keep them up to date as the data plan is changing. Like if you are using geolocation for example, your customers might be moving around. So you need to dynamically on-the-fly, update the data, and it can come from various data sources. And OPAL enables you to, through a lightweight Pub Sub channel, have each of your policy engines subscribe to updates for both policy and data. So for example, if you are maintaining a database or service for geolocation, you can track those changes and have them propagate into OPAL and through it into all of the policy engines that need that data.

\n\n

Eric Anderson:
\nSuper. So you're a policy delivery service for engines. I'm making up taglines that you may or may not espouse as we go here. Or, how did you get into this? What leads one to want to build such a thing? And tell us about yourself along the way.

\n\n

Or Weis:
\nOkay, so maybe I'll take a quick step back and start at the beginning. So I'm an engineer and background. I started writing code at the age of five, but my career actually took off in the intelligence core in the IDF where I was an officer, developer, team lead, engineer, reverse engineer, yada yada, yada. Essentially a cliche of an Israeli entrepreneur. After my service, I worked in a startup called Intiguo where we built container technology before it was a thing, but with a truly horrible go-to-market, even worse than Dockers after they ruined the wrong go-to market. And then I co-founded a startup called Reactful that was acquired by Metadata Inc. I was a VP of R&D in a cybersecurity company catering to governments and like-minded agencies. Only did defensive projects and offensive ones. Super proud of it, especially in retrospect. And then between late 2016 and up until three years ago, a little over, I co-founded and ran CEO a company called Rookout, an effort dev tool company in the production debugging space that was later acquired by Dynatrace.
\nAnd during my time working on Rookout, I ended up rebuilding the access control to our product five times when the company wasn't even three years old. That basically drove me nuts. Reflecting on it, I realized that I've been building this crap, pardon my French, for thousands of times throughout my career, and at no point did I want to. I got together with a good friend of mine and now my co-founder and CTO, Asaf. He at the time worked at Facebook now Meta and he worked on their internal developer tools and internal authorization and he saw that they have invested a team of 30 people for half a decade to build a level of access control that they have and they're still building on. So we quickly realized this is a huge problem now and it's only going to get worse as technology continues to scale out, become more distributed, more complex, and also have more smart components as part of it.
\nSo we realized we want to solve this problem once and for all, and we want that you and other developers will never have to build permissions again. And that's why we decided to create the company that is now known as Permit. And the first thing that we wanted to do was to adopt the best practices. So we started with adopting OPA open policy agent and we wanted to have it run at scale for our customers. And that's where we ran into the first problem. How do we manage this thing at scale? It doesn't provide anything for it. You just get one engine and if you run it by itself, it's okay, but if you have hundreds, thousands, tens of thousands of instances of it really becomes quite a labor, some task to manage it on your own. And so we looked around and as I mentioned, we saw how Netflix used OPA.
\nThey have a great talk as part of the CNCF on YouTube where they describe how they created an engine that replicates the needed data and policy into each of the OPAL instances. So we decided, okay, this is a good approach to go about this, but Netflix didn't open source their project, so we took it upon ourselves to open source it, then we called it OPAL. So it's basically necessity all the way down. Necessity from building companies, necessity into needing to build access control and necessity into, okay, we want to solve access control at scale and we want to adopt the best practices, but we also want to work in a way that is applicable for everyone.

\n\n

Eric Anderson:
\nAnd you mentioned earlier that part of that was adopting. On your journey, you interacted with OPA and OPAL sounds like OPA and so I see some resemblance there.

\n\n

Or Weis:
\nThat was intentional though it's important to note that the A in OPA and A in OPAL are not the same one. One is agent, the ones in administration. So today OPA supports multiple policy engines, but at a time it was focused almost exclusively on OPA. And we knew from the start that we'll support multiple engines and that's why we built it to be extensible and dynamic enough. But starting with OPA, we felt like it's a win because it both resonated with the current community. It sounded nice, it described what it is and it sounded like a cool gem that is fun to have.

\n\n

Eric Anderson:
\nYes, it's very desirable gem. And what's maybe impressive though that we haven't acknowledged is that OPAL has done quite well by some standard. Big community, large companies are depending on OPAL. How did that come about or what was it like seeing that happen?

\n\n

Or Weis:
\nSo yeah, OPAL is now the de facto way to run policy engines at scale. Tens of thousands of companies with names that are front to name-drop like Tesla, Zapier, Accenture, Microsoft, Cisco, Walmart, the NBA, a bunch of banks, a bunch of healthcare institutions, over 10 million docker pools and growing thousands of stars on GitHub and the thousands of engineers in our community, continuing to grow strong. We were very proud of OPAL and the journey that it took us on. We didn't really know or expect that it will grow to be this popular. We definitely hope, but we didn't know. We just felt it was the right way to have this component be open source.
\nWe felt that it's a critical element that is missing as part of the ecosystem. We felt like having policy engines is important, but also being able to manage them is important and that needs to be open source. And we also felt that it is complimentary and not cannibalizing into what permit needs to be. I'm a huge fan of open source in general, but I'm not such a big fan of open core. So being able to go with a different angle here, which I like to call open foundation, where the commercial offering and the open source offering are not providing the same value proposition, but instead are foundations for one another.

\n\n

Eric Anderson:
\nSo permit and OPAL complement each other. They're different offerings, but the people who want OPAL probably also want Permit. And so building the open source project lends itself to helping you build a community for prospective customers.

\n\n

Or Weis:
\nExactly. Though, depending on where you're on the stack, there's a fork in the road. So first of all, with authorization, you need it across the stack. You need physical layer authorization like locks on your doors and on your windows. And then you need network level authorization. So firewalls, zero trust networks, VPNs, then you move into infrastructure level authorization. So service to service access control and admission control and Kubernetes. And this is where OPA originally grew to fame as part of the CNCF and mainly gatekeeper and admission control to Kubernetes. And then you graduate in application level access control, which is basically everything else which I like to summarize is which users can interact with which other users through which features.
\nAnd so if you're doing application level access control, you'll discover a lot of our customers find us this way. So they discover policy as code, they discover either Cedar or OPA, they discover OPAL and then they discover Permit and they often say, "Oh, this actually reps everything I want in a nice bow. I can go into that." But we also have a lot of people in the community doing infrastructure level access control for example. And for them, Permit is while somewhat irrelevant solution, it's less of a relevant solution. So they might just end up working with OPA and OPAL vanilla without the additional sprinkles from Permit.

\n\n

Eric Anderson:
\nAnd maybe going back to this idea that you managed to get a bunch of awesome companies using OPAL, is there a trick to that adoption? A lot of us want to believe you just put it on GitHub and the people come.

\n\n

Or Weis:
\nNo, it's not that simple, but there's a truth in the essence of it. If you build something that is valuable and you communicate that value in a clear way where people are searching for that value, it'll resonate with them and they'll interact with you. But it's not enough just to build it. You need to build it with the right supporting materials. It requires documentation, requires messaging, it requires marketing, it requires getting up on stage yourself and spreading the word. It requires interacting with the right people and it requires being embracing and open.
\nSo when these people come in and they want to learn about the product, you need to have the patience to sit down with them and explain to them how this open source component works. And while Permit is very mature and has all the bells and whistles, OPAL being an open source project isn't as polished naturally. So having the community, having the passion, attention and willingness to help people take this a step forward and starting that feedback loop that will get more people to then help each other is a step that you have to take. It doesn't just materialize day one, you first of all need to be the community before the community emerges as a whole.

\n\n

Eric Anderson:
\nYou build something that has a lot of value, but then you have to communicate this value that seemed to be an important point that you were outlining. And you described various ways that you communicate value in a slack conversation or on stage or through content. Are there taglines? Are there messages that really resonate with people about authorization?

\n\n

Or Weis:
\nI think the best one that we have people jive with is never build permissions again. In the end of the day, I think it's important to know that I think most of the players in the space don't necessarily get it end to end. Authorization is not something that developers want. It's not something sexy. It's not, "Oh, I'm excited to build this authorization layer." No, it's a nuisance. It's something that is keeping you from building the core features of your product.
\nAnd while yes, developers like building technology and using advanced tools and extrapolating on that and building amazing things, they want to build the amazing things that they care about. And this is not something that is unique to any company or any product. So I think we've never built permissions again, we say something very simple that speaks to the heart like, "Yeah, we know you don't want to do this. And we ourselves as engineers, we don't want to do this either, but we want to do this once right for everyone so you can focus on the things you actually care about."

\n\n

Eric Anderson:
\nOr, take me into the product or the architecture some. So most people I think are familiar with very basic permissions. I have a user's table, maybe I already use some third party service in order to authenticate my users and maybe I have one or two roles, they're a user or maybe they can have an admin and perhaps that's just another, in my user's table, that's just another attribute. Are they an admin or not? And I feel like this is the very basic permissions that people start with. How do you build it in a way that you never build it again and how do you build it differently?

\n\n

Or Weis:
\nSo it's about adopting right best practices. And what we try to do with permit is bake those best practices in. So you won't have to think about it, but still let's unpack them. Think one of the most important best practices is decoupling policy and code. The most common mistake people do is assume that the data models and code models that they have for their core application will be the same for the authorization layer, but they aren't, these are two very different tasks that will require different code flows and different data. And if you just merge them together, it's just part of the application code, you're going to have a bad time because as the requirements for both change, you'll have to change both constantly. So you're constantly generating more work and more friction for yourself, and every little change becomes instead of a quick update, becomes a multiple months project.
\nSo I think decoupling policy and coder is the most important thing. Once you've decoupled your policy from your code, the question is now how do you manage it? What is the best way to manage policy and have it run in a good way, a performant way as part of your application? The best answer humanity is come with is policy as code. So in general, through infrastructure as code and other patterns like that, we recognize that complex things that we want to communicate and work on as a team. The best way to represent it and maintain it and work on it is as code. So we want to have a language or format that is dedicated to run as a policy. We want to manage it for the best practices of code, meaning you are able to run tests on it, you are able to do benchmarks, you're able to replicate scenarios and you're able to do code reviews, et cetera, et cetera.
\nSo now we decoupled our policy from our code. We have code that represents our policy. We maintain Git we have versions and tests. Now we want to run it. So ideally in modern infrastructure, we want to run it as a microservice. So you want to have a microservice for authorization that lives alongside your software and is performant and has low latency. So you don't want to have it remote from your software because for every little query you have to go to that remote site and that will add a lot of latency. That's just the rules of physics. So you want to be able to run that microservice next to your software. That's where, by the way, where OPAL comes in. So now we have this microservice, and so how do I make sure that this microservice remains up to date? Because it needs to be independent in the field. It needs to be able to answer those queries as they come in and can't be part of a huge separate database.
\nSo how do I load into it at the edge? How do I load the policy and data? And so that's where OPAL comes in with the second-best practice. Be event driven, which is a common best practice for cloud native applications and microservice applications specifically. So now we're event driven. We have decoupled policy from code and we have a dedicated microservice for authorization. Lastly, we need the right interfaces. We need to recognize that this is not just for developers. If you're building policies, if you're maintaining a product, at the end of the day, access control is really about connecting people and systems to what you've built. And so our organization building a product or a system, everyone's involved. So it's the developers, obviously it's DevOps obviously, but it's also the product managers, security, compliance, sales, professional services, support, everyone.
\nSo if for every little change you want to make, you have to go back to the developers that will write code and go through a software development lifecycle. Everyone's having a bad time, especially the developers which you force into becoming a bottleneck for this annoying thing that they don't want to do. So it's about creating the right interfaces. We need to enable everyone around the table to be able to participate in the policy authoring and maintaining process. Each of them needs the right interface for that.
\nSo if you're building this on your own, you can think, "Okay, what do my product managers will need here? Maybe they just need a way to assign roles. Maybe they just need a way to set attributes. Maybe they need a way to maybe affect the policy a little more." What we've done with Permit is we provide you with a policy editor that I like to say a monkey can use or even a product manager if they're smart enough. By the way, ironically product managers are the ones that love that joke the most. And what it does, it generates policies code for you. So a product manager can work with it and generate the policy and sales and professional services can come in and generate the policy and it'll all get distilled into that Git repository that we described before. And the developers can still load code directly so everyone gets the right interface that is suitable for them.

\n\n

Eric Anderson:
\nIt sounds like and I'll try and use my analogy from earlier. If in my application I might be tempted to say something like, "If the user is an admin or if the user is not an admin, display this list with this information," and instead I just say, "Display this list for all those who have permission." And I outsource the permission system to tell me what information to display. And then the policy can be applied independent of the code, the code's not saying the logic on who gets to see what, the permissioning system is.

\n\n

Or Weis:
\nYeah. So thank you. You also, you want to do that filtering across different services and you want to have a unified policy that affects all of them. So once you have those policy engines and you can load each of them with the right policy and data, you can keep everything in sync without having to constantly run about managing it. And it's important to note that with policy engines, it's not just about binary decisions. It's not like show this or don't show this. It's also give me, filter out a subset of the data or even do something called partial evaluation. Partial evaluation, you ask the policy engine to spit out a abstract syntax stream, basically a set of conditions that you can then compile into SQL or another query that you can pass to your underlying database. So you can filter at the database level itself, but still it'll be driven by the primary policy that you've defined together as a team of stakeholders.

\n\n

Eric Anderson:
\nMaybe moving away from the project for a bit, or you mentioned earlier you have opinions around open core versus what you called open foundation. You could tell us more about that or any other thoughts about what developers want to experience in an open source project or a software vendor selling to them.

\n\n

Or Weis:
\nYeah, I think at this point, first of all, it's important to note that our entire world is open source, cute anecdote. So as part of the investment round we just did, we were requested to list all of our open source components, just the direct dependencies, that's tens of thousands. And if you extend it to subsequent dependencies, that's I think almost in the hundreds of thousands, unless you dive into it, we're not aware of how much we're immersed in open source. Like every little thing we interact with has open source into it. So I think first of all, it's clear open source as one. If someone was still debating this it as one and it's part of our present and definitely a key part of our future, and it's only going to become more and more dramatic. Now the question becomes on how we manage open source and how we align it with our business.
\nSo open core used to be the default answer here. You create an open source project and then you upsell it, you provide the enterprise value as a commercial offering, you provide the support or professional services. This has grown amazing companies like Red Hat, MongoDB and others, but that happened in a few decades back. And as we are running towards the singularity, things are speeding up and the speed of things today as far higher than the ones that happened before. So now when you're building an open source project, by the time it matures and you can commercialize it, other companies will commercialize it before you. So for example, look at Elastic. So Elastic built an entire ecosystem with Elastic search and Kibana, et cetera. And by the time Elastic matured with the open source project and created the category of our companies, AWS, CoraLogix, Logs.io stepped in and basically commercialized a SaaS version of it faster than Elastic themselves.
\nAnd then they started fighting over licensing, and then that infuriated the entire ecosystem and pushed Elastic to the side while they were the crown owners of this thing. And that forced them to basically pivot into a security company. You can see also a similar example with Docker. So Docker exploded as an open source project. It grew and grew and grew, and it pushed aside commercializing it. And when they had no more choice, they had to commercialize it, they ended up making the mistake of trying to take something back. Once you put it out, you can never put the genie back into the bottle. That's it. It's not yours anymore. But that was also apparently for Elastic. So once Elastic tried to change the license, AWS just forked it and created their own open source project for Elastic again.

\n\n

Eric Anderson:
\nI think I'll just jump in here to point out that I think Terraform is an interesting situation because prior to them, the people that were changing licenses, Elastic and others cited the cloud providers as the competitive set. Terraform I think was explicit that we're worried about other startups commercializing this, and it was those other startups who forked. We had OpenTofu on the show recently, Spacelift and others. And so I think you're right. You're not just worried about some other incumbent eventually commercializing you, but peer startup companies will latch onto your project and commercialize it as well.

\n\n

Or Weis:
\nIt doesn't matter who or what. It's the ecosystem itself. The ecosystem itself is evolving faster than you were able to react as a single company. And ironically, the open source project is becoming an encumbering effect to you because you're wasting, well, you're investing energy in growing the open source project. You can't invest the energy in moving fast with it through the market. It's an irony but it's the fact. And yes, HashiCorp with Terraform is a terrific recent example. They realize that they are unable to commercialize it in the way that it is because of the changes in the ecosystem. And also there's a worry that they will lose control of it altogether. So a lot of people would come against this and say, "So wait a minute, why are you saying so, we can't do open source anymore, so there isn't a way to commercialize this?"
\nAnd I'm like, "No, hold your horses." You need to just think about it in advance. You need to realize that this motions are going to happen. So if you are going to be in a situation where you're cannibalizing your own market with your open source projects or you're enabling other players to eat on your own market through that, you're going to have a bad time. But if you create enough separation and you plan on symbiosis between your commercial offering and your open source offering, they can grow together and you'll never run into this problem. And one model I suggest for this is open foundations and symbiosis is one is based on top of the other. There are probably other symbiotic models that we can think of, but this one is the one that I found the most easy and natural to approach. And yes, it requires more thinking, it requires more design, but it's not that big of a deal. And I think if we start to apply more of this logic into open source projects we'll get better open source projects and we'll get better companies at the same time.

\n\n

Eric Anderson:
\nFantastic. Or, this is also an exciting time for the company, not just new funding, but you've got a bunch of things you just launched. Maybe you can tell us what's new around it Permit.

\n\n

Or Weis:
\nYeah, so we recently launched both new offerings for Permit elements and for more policy models. So maybe a few words on policy models. Everyone's familiar with RBAC role-based access control. It's the bread and butter of the ecosystem, but people are not as familiar with ABAC and ReBAC attribute based access control and relationship-based access control. But as applications are becoming more complex and having more advanced patterns, RBAC isn't enough anymore. And so more and more companies fighting themselves needing these more advanced models. And what we try to do if Permit is that you don't have to understand these models. You don't have to actually get into all the academia and knowledge and modeling scenarios. You can just work with simplified tools that will enable you to quickly shift from RBAC to ABAC and ReBAC and back and forth. We realize that software needs to be very malleable today.
\nIt's about abstracting this and allow you to move quickly. And one of the recent models, so we had RBAC and ABAC from way back then very early on, and now we added ReBAC. So the ability to build basically a graph of hierarchies that connect everything, but unlike other vendors that tell you to literally draw the entire graph and model all the different edges and nodes, and you need to be a computer scientist on how you're going to be navigating on this graph, with Permit, again, we want a monkey to be able to work with this. So with Permit, we just guide you on to connecting dots. So we [inaudible 00:27:17] proof for concept called rolled derivation or we apply the concept called world derivation. So for example, let's say we have a customer that wants to manage farms. So they have an operation with a set of farms and then they have fields for each farm.
\nSo with Permit, you can just create those dots and connect them. So you say, "I have farms, I have operations, and I have fields." And you say, "A field goes in a farm and a farm goes in an operation." And now you can assign roles for each. So you can be, for example, an owner of an operation, an owner of a farm, an owner of a field, just stick on those labels. And then you can say you can apply role derivation. You can say, if a field is a part of a farm and you are the farm owner, you automatically get an owner of each of the fields and you can say, if you are an operation owner, you automatically become an owner of the farm. It fits part of that operation. And these cascades, so you are now from the operation, can own all the fields.
\nYou're putting basic labels, connecting dots, and the graph emerges for you without you having to think about it all day one. And you can do all of it with a very simplified UI. We feel that is our responsibility is Permit to take all of these complex ideas for the performance challenges, with all the modeling challenges, with all the intricate elements of how you connect scalable data and scalable policies and just make it as easy for you so you don't have to think about it. And this is what we launched now with ReBAC support. So our ReBAC support is as simple as our ABAC support and then it's our RBAC support. So this is one thing we're very excited about.
\nAnd the other one is Permit elements. So we provide interfaces not just for yourself as a stakeholder, managing the policy, but also for your end customers. So think about being a customer, being a user using an app. You're constantly doing access control, you are inviting upper users, you are assigning roles to them, you're giving them time-based access. You are impersonating them and viewing the system as if you're them and trying to view their perspective. You are accessing the system with additional privileges in emergency cases. And this list just goes on and on and on. All of these are UI interactions that you already expect to see in products, but why are we constantly rebuilding them from scratch?
\nSo we decided with Permit elements to put an end to that. So when we say never build permissions, again, it's the entire package. So you get a user management scheme, get audit logs, you get approval flows, you get all of these ready to be embedded into your software, again, so you can focus on what's unique to your product. And now with approval flows for Permit elements that again, I think we are very excited about. We've constantly got customers asking us, "Oh, I need to have this flow in my product. And it's very complex to build. There's also cryptography on the tokens that need to be sent, can you handle this for me, it's a nuisance," and we're excited to take it off everyone's table and make sure it runs right.

\n\n

Eric Anderson:
\nAs you spoke there Or, I wanted to revisit the permissions space a little bit more. When I talk to developers of SaaS products. Some of them are like you in your Rookout experience and they're like, "Man, I hate permissions. This was a big pain." And others are like, "Oh, we added a couple roles and haven't really thought about it again." Certainly most developers listening to a show like this has experienced AWS IAM. They can imagine a very complex application having a very complex permissioning system. And they're like, "Definitely, I'm sure you have customers who feeling a ton of permission pain," but I also wonder that there was a time before Auth0 when everybody built their own authentication system and didn't think they needed one from a vendor. Is there a world in which everybody decides it's just normal to pull in an outsourced third party permissioning system on day one of a new app? And what's going to change in people's minds? What will they realize in order for that real world to exist?

\n\n

Or Weis:
\nFirst of all, I'd start with saying that it's okay not to do so. If you just have a basic application now and you feel this is not a priority for you, that is great. You should focus on your priorities and what's right for you at the moment in time. But you do need to realize that as your application evolves, it's a matter of time until you face more complex authorization requirements. It's on a matter of if it's a question of when. And this question also meets companies earlier and earlier because what we expect from software today, that bar is also constantly rising. The compliance and privacy requirements that you have are constantly being pushed upwards. Just look at both the European Union's new cyber requirements and the Biden administration's requirements. Look at how more and more companies are trying to be SOC to and GDPR-compliant at day one.
\nAnd all these compliance requirements are 80 to 90% access controlled mechanics that you need to make it. So either manage them manually or you bake them in into what you're building. And also the patterns in the applications themselves. The way we interact with other people, the way our features interact with data, they're constantly becoming more complex. So the bar for basic access control is constantly rising as well. So if 10 years ago you could have just done not even roles, you could have just done two roles. You have the developers or admin and everyone else is just a user, you just maybe have some access control list. That word is gone. Almost every application today will end up in RBAC within months and in a year from now, most applications will run into RBAC and ABAC and maybe even ReBAC within months and this period of time continue to become shorter and shorter and shorter, especially as AI systems proliferate the space. Just think about the speed and complexity that they interact with other software.
\nSo it's really a question of time, both in terms of the growth of your software and the growth of the entire complexity of the software market. And I think I don't need to convince people in general with Permit, all of our customers, our exception, are all inbound. When I talk to customers, I rarely pitch the product. They come to me after they say they want the product, they already played with it, and they come within specific questions. So I never tell people, "Oh, you have to use this at day one." Use this on the day that is right for you and just plan ahead. Think about the best practices and maybe design your software as you're building it. If you're implementing on your own, design it in a way that you don't have to reshuffle the entire deck to upgrade to the new features, and it's not that hard to do.
\nAnd when you do come to us and you want to use Permit, one of the key things that we try to do is make the migration or the adoption easy enough, and we combine two key requirements into the design. Both have it gradual. So you can apply Permit on a single function, single route, single middleware, single microservice, single reverse proxy, single API gateway. And you can choose how wide of a blanket you want to start with. And the other part is be able to do this fast. So we have SDKs for every language. We have plugins for most reverse proxies and API gateways.
\nSo you can move in quickly and apply it on the subsidy that you want with speed. As long as you don't hard fuse your policy too much into your code that you'll have to clean it out later. You can have a good time. And what I recommend a lot of people do is even if you're microservice for authorization, it's just a function that returns true, just take it out and have it as a separate service and it'll make your life a lot easier later. And the cost is basically negligible. So small steps now, big payouts later, and we don't need to go crazy about it. But yes, this is coming for everyone, but we can take it in our own pace.

\n\n

Eric Anderson:
\nOr anything we didn't cover today that you'd want to cover.

\n\n

Or Weis:
\nAnother item that comes to mind is a podcast that I run with our dev advocate Philip, also known as Developer Philip on YouTube, we have a podcast called Command+Shift+Left which is one of the most fun things I get to do as part of this job. It's a developer-focused podcast and with a unique format. So it's not like your interview format, but instead you have four developer leaders coming in with developer-oriented fact. And then we just laugh and debate and foreign additional facts. And it's very lighthearted, it's very fun. And we constantly find ourselves getting into, are we living in a simulation or not? Some way it's all of our episodes derail into that. But I really enjoy having that, doing that podcast because as I said, very lighthearted, very fun. And I bring in a lot of funny people. And for me it's just hanging out with friends. And I get the sense that people also enjoy listening to the podcast and just hanging out with us as well.

\n\n

Eric Anderson:
\nSo that's Command+Shift+Left.

\n\n

Or Weis:
\nCommand+Shift+Left and you find it on most social networks with the tag JustShiftLeft and you'll find it on command-shift-left.com with CMD-Shift-Left. And it's on Spotify, Google Podcasts, apple Podcast, et cetera.

\n\n

Eric Anderson:
\nOr thank you for coming on today. And also thank you for OPAL, you and your team. This was a gift to humanity. Maybe that's not what you plan to give to humanity when your day is done, but at least you've given that.

\n\n

Or Weis:
\nI think it's a good start and it opens up my appetite to do even more and to find good alignment points for good open source.

\n\n

Eric Anderson:
\nYou can subscribe to the podcast and check out our community slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson, and this has been Contributor.

","summary":"","date_published":"2024-02-15T05:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/2c0ac6c4-2825-4646-8c83-cb84b6367c61.mp3","mime_type":"audio/mpeg","size_in_bytes":35837954,"duration_in_seconds":2235}]},{"id":"8eeb1575-43d0-4d3c-8d89-553c9ccbd8b4","title":"Oxygen Deprivation: FerretDB with Peter Farkas","url":"https://www.contributor.fyi/ferretdb","content_text":"FerretDB enables users to run MongoDB applications on existing Postgres infrastructure. Peter Farkas (@FarkasP), co-founder and CEO of FerretDB, explains the need for an open source interface for document databases. Peter also discusses the licensing change of MongoDB and the uncertainty it created for users. He emphasizes the importance of open standards and collaboration among MongoDB alternatives to provide users with choice and interoperability. \n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n The epic mountain adventure that inspired FerretDB\n\n Why commercial open-source can be additive rather than extractive\n\n How compatibility and open standards drives innovation and competition\n\n PDFs as an example of corporation-supported standards\n\n Three tenets for building a successful open source project\n\n\n\nLinks:\n\n\n FerretDB\n\n Percona\n\n\n\nPeople:\n\n\n Peter Zaitsev (@PeterZaitsev)\n\n\n\nPeter Farkas:\nOpen-source is about getting rid of vendor lock-in, about giving choice to the user. And we want to make that happen by creating the open standard and by collaborating.\n\nEric Anderson:\nThis is Contributor, a podcast telling the stories behind the best open-source projects and the communities that make them. I'm Eric Anderson. Peter Farkas is with us today. Peter is the co-founder and CEO at FerretDB and the creator of the project by the same name. Peter, welcome.\n\nPeter Farkas:\nPleasure to be here, Eric. Thank you so much for the invitation.\n\nEric Anderson:\nBefore we went on air here, we were just recollecting that you have quite a history in databases, running database companies.\n\nPeter Farkas:\nYeah. I think the reason I like databases, especially open-source databases is because it's needed for everything. You name a technology, and you will need a database for it. And it's just so great to work with these technologies to enable bigger things to happen. And the reason why open-source is important in my life is because I strongly believe the databases you use should be open-source. Should be based on an open standard like SQL, MySQL, Postgres, and all the other derivatives. And this is why we started FerretDB as well. Before FerretDB, I worked for Percona, which is probably the most well-known open-source database consultancy firm. I learned a lot there. Actually, my co-founder, Peter Zaitsev is the founder of Percona as well. And we have some other Percona people at FerretDB. Then I went on to found Altinity, which was a much different company than it is today. And worked at Cloudera as well, a bit with Big Data and Hadoop. And then here I am at FerretDB now.\n\nEric Anderson:\nFerretDB, as I understand it, is Mongo on Postgres. How do you describe it?\n\nPeter Farkas:\nI think that's a very good summary, a one-liner summary. We basically turn Postgres into MongoDB, a MongoDB compatible database. So how you can imagine that is if you have an existing Postgres infrastructure and you have a MongoDB application, MongoDB is no longer open-source. And with FerretDB you can use your existing Postgres solution to run your MongoDB applications as well. And it's not just Postgres, we also support SQLite, SAP HANA and other backends. So it's possible to turn other databases into a MongoDB compatible database as well.\n\nEric Anderson:\nI wasn't aware of that last part. And you already alluded to a couple reasons why you might be interested in doing this. One was MongoDB is no longer open-source. And the second was you may have an existing investment, you could interpret investment in many ways there. But an existing focus on either Postgres, SQLite or Hana, is that right?\n\nPeter Farkas:\nSo after MongoDB went proprietary in 2018, MongoDB was adopted by a number of companies and even governments who had a policy that they could only use open-source software in their technology stack. And with the license change on MongoDB side, they found themselves in this impossible situation where they were already using MongoDB. And it was no longer open-source, but there was no alternative to it. And these users, these companies, these governments were looking to find alternatives and they were looking at Postgres. They were looking at some other solutions. But what they found is that there's no solution which would not require them to rewrite their entire application. And with FerretDB, you can skip all of that because you can just turn these relational databases into a MongoDB compatible database. And so we started with Postgres because we do believe in Postgres. We think this is where users are gravitating towards for a reason.\nAnd we ended up supporting other database backends like SQLite and SAP HANA as well because there was demand from the community. Turns out that MongoDB is no longer able to serve use cases where there is an embedded application. For example, on a networking appliance, which uses MongoDB, but it's not practical to turn that into Postgres because of resource constraints. So that's why we decided to support SQLite as well. And now in some network appliances, as I mentioned, we are able to replace the last open-source version of MongoDB, which they still run because they still have to do that. Now, we can turn those appliances into fully open-source solution again with FerretDB and SQLite. And with SAP HANA, so SAP HANA is a very interesting example because SAP just decided to build compatibility into FerretDB. So they contributed as open-source contributors. And they're still building out the compatibility for SAP HANA into FerretDB, which is a great thing to see because that confirms our suspicion that there's a need for an open-source interface for many database backends.\n\nEric Anderson:\nIs there a community fork of MongoDB since the licensing change? We had the folks behind OpenTofu on the podcast a month or two ago. That was a licensing change followed by a very quick community fork that seemed to get enough critical mass. In some ways, FerretDB represents that community fork from Mongo?\n\nPeter Farkas:\nWell, it represents the community's desire to have an alternative to MongoDB, but it's not a fork of MongoDB. So we are not using any of the code from the last MongoDB open-source release, simply because it would be a massive undertaking. Also, it would be pretty late. So we started FerretDB three years after the license changed. So much of that code is still. But it's an interesting example that you brought with OpenTofu. So the HashiCorp story, it created a much louder uproar in the open-source community. Partially because when MongoDB came out with the Server Side Public License, they stated that it's an open-source license. They even attempted the open-source initiative to certify the SSPL as an open-source license. So there was a large amount of confusion and there's a large amount of confusion even today, whether the SSPL is an open-source license or not.\nSo I think MongoDB with introducing that confusion managed to avoid the forking of MongoDB because it was not clear whether the SSPL license is going to be regarded as an open-source license or not. With HashiCorp, this was much clearer from the get-go. It was clear that the community needed to do something and that an alternative is needed. Back then with MongoDB was not as clear. And by the time SSPL was really I guess considered not an open-source license, it was already late for the community to get the right amount of momentum to do a fork. That's just my private opinion on the matter. It's rather interesting how different the two events were.\n\nEric Anderson:\nI think it's a good opinion. HashiCorp is now the 10th or something notable project of late to do this. And we've had some practice on how to respond maybe to these as a community. Whereas with Mongo, I think it was like, \"What's going on? What is this?\" And maybe the ambiguity not only affected developers. But you've talked Peter, I believe at how even today some legal and large corporate entities feel like it's unclear how much liability they're exposed to operating Mongo. Tell us more about that.\n\nPeter Farkas:\nThat's right. So we talked to large enterprise users on SSPL before and after founding FerretDB. We tried to understand where enterprise companies are in terms of their perception on SSPL. And the overwhelming feedback was that their legal teams are unsure where the boundaries are when it comes to what the SSPL allows and what the restrictions are. So the ambiguity of the SSPL license confuses large enterprises as well, which we believe, I mean, I don't think it's a resource problem on their side. It's more like yes, the license itself is so ambiguous that it's indeed hard to tell what is allowed and what is not allowed.\n\nEric Anderson:\nYeah. And so part of that's the language. There's certain lines in there that are just maybe ambiguous as to how they should be interpreted. And then two would be the amount of history on judges and cases clarifying or interpreting that language. I would imagine there's just not a lot of times that those things have been challenged.\n\nPeter Farkas:\nYeah. And just to give an example here. So the SSPL license... And I'm not going to quote the legalese verbatim, I'm not a lawyer. But essentially, what it says is that you are allowed to provide MongoDB as a service if you added enough value on top of it. That it's fundamentally different from just a database which walks and talks like MongoDB. Now, how do you define the value there, the amount of value which would be enough for you to add to be able to run MongoDB as a service? What does providing MongoDB as a service really mean? This is not defined in the license and that's where most of the problem is.\n\nEric Anderson:\nSo three years after the license changed, you woke up and decided it was time something happened.\n\nPeter Farkas:\nWell, it's a rather crazy story. We went to this epic adventure to the Himalayas to K2 base camp. And I think the idea of FerretDB was a result of the right amount of oxygen deprivation and cold, I guess. We talked a lot about MongoDB, taking the fact that MongoDB is one of the default databases one would use next to Postgres, next to MySQL and next to some other mainstream databases. But the only one which is not open-source. And that's rather weird because usually open-source databases are favored by users. Not just because of the need to avoid risks or license fees, but also because it's much easier to learn an open-source technology compared to a proprietary tech. So we talked about how MongoDB was still able to avoid being forked after all these years. And we tried to understand why. And FerretDB was started with the mission that this needs to be changed, that things need to go back to how they started, which is open-source.\nAnd we also think that the word of document databases would need a similar open standard what SQL has. It's the same or very similar story as when IBM came up with the concept of the relational database. Then IBM came up with the concept of SQL, then it dominated that market for a decade until alternatives started popping up and SQL became an open standard. And we all see that today, SQL is the definition of commodity because it's everywhere. It's taken for granted that yes, if there's a database you can interface with it using SQL. But that is a result of work and vendors coming together and the creation of the open standard. And this needs to happen with document databases and particularly with MongoDB as well. So that is the mission of FerretDB. That's why we exist because we want to change the industry and expand the market the same way as how SQL did back then in the '80s and '90s.\n\nEric Anderson:\nIt wasn't clear that after the NoSQL enthusiasm that we would be back to being excited about Postgres and other SQL databases today.\n\nPeter Farkas:\nI remember the NoSQL craze, and I think part of it was due to a big misunderstanding. I still worked at Percona. It was 2014 when MongoDB really started coming up on our radar. We were a MySQL company, so we had nothing to do with MongoDB. But I do remember that most voices were all about NoSQL is going to kill relational. And that's a huge misunderstanding because it's not about that. NoSQL is a good tool for many use cases. It makes certain things easier in certain situations, but there's no such thing as the one database which is good for everything. Not even Postgres. There's a good reason why there are things outside of Postgres. There is a good reason why there are many flavors of Postgres because they are all better at something which the user particularly cares about in that specific use case. So hearing that NoSQL or MongoDB is going to change everything by killing relational, I think that was a bit of a nonsense and a result of some misunderstanding.\nWhat is happening today is that the two approaches are converging. So you see a lot of relational databases such as Postgres implementing document related capabilities. And at the same time document databases such as MongoDB started implementing or implemented SQL interface for BI workloads. And there's Yugabyte, for example, which provides database as a service which is capable of running document and relational workloads. And that is where NoSQL belongs right next to relational, right next to what we already had because there's a need for both.\n\nEric Anderson:\nAnd Ferret fits in that vision because I can have my SQL, my Postgres and then run Ferret right alongside it for my document DB use cases.\n\nPeter Farkas:\nRight on. I just sold it to you.\n\nEric Anderson:\nDone. The history of your career that we talked about at the beginning, you described two different models. One was the consultancy and then I don't know if a product company is the right other one. But you mentioned how Percona was different and then Altinity started out a certain way and then kind of changed. Help us understand because I think in the world of open-source database companies, to an outsider, it might not be clear that Percona or Altinity or these other models exist.\n\nPeter Farkas:\nThe easiest, not saying it's easy, but the easiest way to monetize open-source, first of all, why do you need to monetize open-source? Just to go back to the complete root of the problem here. The reason you want to monetize open-source is because at least my belief is that an open-source project has a much better chance to survive if there is a company behind at least some of it. A large contributor, which nurtures the open-source project and executes on a business strategy which provides the resources. Not just for the business itself, but for the open-source project as well to thrive and grow. I think that's a good thing.\n\nEric Anderson:\nSo commercial open-source isn't merely extractive. It's not just taxing the open-source system. It is actually additive in the sense that it brings life and energy to the open-source ecosystem.\n\nPeter Farkas:\nI believe so. Because if you take a look at some examples where there were two or three open-source contributors keeping a project alive and suddenly there was a critical bug and there was no one around to fix it. And that resulted in losing trust in the project itself. That's a great example of how expecting everyone to work for free indefinitely and also provide 24/7 support for said technology and project is probably not realistic.\n\nEric Anderson:\nWe've had guests on here describing how even after his open-source project was successful, he wasn't sure of the end game. He was like, \"There's no real way to hand this off to somebody else and everybody just expects me to maintain it. And I can't do this forever.\" So commercial open-source gives perpetuity to open-source. That's the first fundamental premise.\n\nPeter Farkas:\nYeah, I think it's always confusing because open-source is regarded as something which can be used for free by everyone. But in reality, for a large user, let's take Apple. If you pick up your iPhone and go to the legal section in the about menu, you will see that your iPhone or iOS is based on 100 different open-source projects and open standards. And someone needs to maintain those. And probably Apple needs assurances that that technology is going to exist even a couple of years later as well. Which brings up the question, how can you increase the level of trust in your open-source project? And that is through ironically monetizing through providing services for it. So if you provide services for your open-source project with your company, then you can provide the necessary amount of assurance to your users that they will have someone who is going to be able to come and fix if something happens. If there's a bug or a missing feature or simply just ongoing maintenance of the code.\nAnd this is what companies like Percona or Databricks or Cloudera or others recognized. If you take Cloudera, they are a company built on Hadoop, which is another free and open-source technology. But most of the users of Hadoop would not be able to take the risk of just using Hadoop without a company like Cloudera, which provides 24/7 support for said software. So it's mutually beneficial for everyone involved. And then there are the hobbies and the smaller users who also benefit from this relationship because they get a strong open-source project as a result, which stands on firm foundation.\n\nEric Anderson:\nAnd in that context, where does FerretDB fit, Peter?\n\nPeter Farkas:\nSo FerretDB was not started with having a business in mind. We wanted to solve a problem. The problem was, \"Hey, what should we do with this situation where we care about databases, we care about open-source?\" And most users using document databases still believe that MongoDB is open-source and still use a proprietary software probably without even knowing it. So we started FerretDB with the intention that we are going to disrupt the current situation where MongoDB is the only company which can provide and can develop MongoDB itself. We were pretty successful with catching the attention of the community. We were pretty successful with making people interested in the problem. And what we need to do now is we need to step up as a company as well, which provides services and as a service solutions to make sure that the project itself is going to be sustainable.\nSo that's where we are now. What is more important is there are other alternatives of MongoDB. AWS is DocumentDB or Microsoft's Azure, Cosmos DB for MongoDB. We actually working really hard on bringing all of the alternatives together to work on creating an open standard. To make sure that these products will not be merely alternatives to MongoDB. That they will be MongoDB compatible but not as alternatives for MongoDB, but as implementations of the open standard or the eventual open standard. Meaning that MongoDB alternatives will not have to run after MongoDB or be driven by MongoDB Inc's priorities. That's what we are working on.\n\nEric Anderson:\nThat's curious. I was at Google working on BigQuery and related Big Data SQL things at one point. And originally, the BigQuery was not ANSI standard SQL. It was a variance which suited the kind of workloads we expected in BigQuery. But that was always a request of the community, of users. And over time Google has now supported some kind of ANSI standard of SQL. And so you would like to work with Microsoft and AWS and others and agree on what does Mongo compatible mean? And compatible may not even be the word in the future. Maybe like OSI standard, ANSI standard, some kind of standard.\n\nPeter Farkas:\nExactly, exactly.\n\nEric Anderson:\nAnd maybe even the MongoDB company variance may or may not live up to that standard.\n\nPeter Farkas:\nExactly. This is our big goal. This is our desire. And this is something we put a lot of effort in to make sure that this cooperation is going to be a successful one because this is the key to innovation. This is the key to competition, healthy competition on this market. Right now, there's no competition whatsoever because the solutions MongoDB itself and its alternatives are not compatible with each other. So you can't just look at your MongoDB Atlas invoice, be unhappy with the cost and go elsewhere. You can't. You're logged in. There's no opportunity for you to remedy that situation without having to touch your application. Of course, you may be lucky, you may be able to migrate. But as soon as you used even one of the advanced MongoDB features, you're locked into MongoDB Atlas. This is not what open-source is about.\nSo MongoDB still calls itself open-source in many of its documentation and marketing materials. But open-source is about getting rid of vendor lock-in, about giving choice to the user. And we want to make that happen by creating the open standard and by collaborating not just with the big cloud providers, but we are hoping to collaborate with MongoDB as well. They are also needed in this discussion. It's their interest as well.\n\nEric Anderson:\nAnd maybe I can develop the value of a standard further because I think there's a void lock-in and there's some economic reasons. But there's all this whole tooling world where you support SQL suddenly there's like... In the world of at least Big Data and there's visualization solutions that are all SQL centric. There's code writing, clients. As long as you stick to a standard, there's a whole plethora of interoperability that emerges. And you're saying we could bring that to the document universe?\n\nPeter Farkas:\nYeah. It does not exist today. And with the open standard and with the collaboration between all of these different alternatives and MongoDB itself. Hopefully, we are looking at a massive expansion of opportunities and increase in interoperability and basically the same thing what we have with SQL. But just to depart a little bit from the world of databases, there is a very simple example on how an open standard can help and expand a given technology. Adobe PDF, we take it so much for granted that if you go to your airline and you want to get your boarding pass, you just click on the download button, you get a PDF or anywhere else with any vendor across many different unrelated software. There are things like PDF, which are common. And you can just print it, you can export it, you can edit it. No one used PDF before 2006 when it was not an open standard.\nIt was actually a technology with a dwindling popularity, it was non-existent. And only when vendors came together and Adobe also allowed that to happen to a certain degree, PDF became this universal tool. And the use of the technology itself skyrocketed. And it's just unbelievable how popular it is today. Same with if we go back to databases again, SQL. The moment it became an open standard, MySQL, Postgres, and all the others could implement the open standard. And we just take it so much for granted, and there's such a vibrant and amazing amount of innovation in that area. And we believe that the same exact thing is going to happen with document databases if we succeed and we will succeed with creating an open standard out of it. That's the essence of our vision.\n\nEric Anderson:\nThat's awesome. And history has shown you can have some success. You've got 8,000 GitHub stars in this big growing community. How did that come about? Any tips you can give for us, Peter, if I were to start an open-source project to make it as successful as yours?\n\nPeter Farkas:\nWell, I don't want to make it look like that all the success we had is a result of precisely calculated set of actions. But if I want to reflect on the success we had so far, I think the most important is to address a real problem. And for that, you probably need to either sell your vision even if you don't have your product yet. In our case, we created a tech demo and explained why the world needs FerretDB. And we got a massive amount of positive response on that. We also got a lot of feedback, we needed to address that. And as soon as the community saw that we are a team which works with the community and which listens to the feedback and addresses the questions, they were more and more likely to work with us. And I think that is what we did in order to get this amount of attention from the community.\nAnd it's not just about stars. What we are most proud of is contributions. So I think that community contributions are a lot more important than stars. I think that's the real tool or real metric you can use to measure your success and whether you are doing the right things. Because as soon as you see external contributions, that's when you can be sure that someone is trying to scratch their own itch by improving your code. And that is the sign of real interest. It's very easy to star a project. It takes no commitment whatsoever. But contributing, that's a whole other level, and that's what we are really, really taking seriously in terms of a metric.\n\nEric Anderson:\nWhat's the state of the project today? You've described that you support Postgres and SQLite. This HANA project is happening. As people dive in, what should they expect out of FerretDB today and what should they expect in the future? What are you working towards?\n\nPeter Farkas:\nYeah, so we've been working on FerretDB for two years now. The expectations towards FerretDB being a MongoDB alternative, most users expect that it is going to be the exact same thing. Unfortunately, that's not the case. So MongoDB has a lot of advanced features, a lot of features, which none of the alternatives implemented. Simply because we have this saying that 85% of MongoDB workloads use maybe 25% of MongoDB features. So that's what we are aiming for, to provide these core set of features, which most MongoDB users can utilize to migrate away from MongoDB Atlas if they want to. On the other hand, we are also in need to address performance, the question of performance. So right now, we are about half as performant in most cases compared to MongoDB Atlas. This is true to other MongoDB alternatives as well. And while most use cases are not really affected by this, this difference is not sustainable.\nSo we want to create our own Postgres extension, which addresses some of these performance issues. But we also need to introduce other tweaks as well to get to where we want in terms of performance. So all in all, I can't say that we would set expectations, right? If we would say that \"Hey, whatever workload you have, just migrate to FerretDB because you're going to have the same experience.\" That's far from reality. But we are onboarding more and more users and their use cases. And we are developing FerretDB along the way.\n\nEric Anderson:\nThe FerretDB projects actually even got more going for it than I realized coming into the show. This idea of a document standard is really interesting. And you've convinced me that your efforts here not only can build an interesting business but can really help advance the open-source community. So thank you for what you're doing for all of us. It's a gift to humanity.\n\nPeter Farkas:\nWell, that's probably an exaggeration. But we'd like to think that we are changing the database space for the better by re-enabling the users of document databases to have a choice. I think that's what we do. It's far from revolutionizing healthcare or AI. But as I said earlier, I like the database space because it serves as the foundation of amazing tech such as AI, such as anything else you can think of. And it would make me proud if we could see five years from now that FerretDB disrupted this space in a way that users became better off than they were with MongoDB. That would make me very happy, and that's what we are marching towards.\n\nEric Anderson:\nYou can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.","content_html":"

FerretDB enables users to run MongoDB applications on existing Postgres infrastructure. Peter Farkas (@FarkasP), co-founder and CEO of FerretDB, explains the need for an open source interface for document databases. Peter also discusses the licensing change of MongoDB and the uncertainty it created for users. He emphasizes the importance of open standards and collaboration among MongoDB alternatives to provide users with choice and interoperability. 

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People:

\n\n\n\n

Peter Farkas:
\nOpen-source is about getting rid of vendor lock-in, about giving choice to the user. And we want to make that happen by creating the open standard and by collaborating.

\n\n

Eric Anderson:
\nThis is Contributor, a podcast telling the stories behind the best open-source projects and the communities that make them. I'm Eric Anderson. Peter Farkas is with us today. Peter is the co-founder and CEO at FerretDB and the creator of the project by the same name. Peter, welcome.

\n\n

Peter Farkas:
\nPleasure to be here, Eric. Thank you so much for the invitation.

\n\n

Eric Anderson:
\nBefore we went on air here, we were just recollecting that you have quite a history in databases, running database companies.

\n\n

Peter Farkas:
\nYeah. I think the reason I like databases, especially open-source databases is because it's needed for everything. You name a technology, and you will need a database for it. And it's just so great to work with these technologies to enable bigger things to happen. And the reason why open-source is important in my life is because I strongly believe the databases you use should be open-source. Should be based on an open standard like SQL, MySQL, Postgres, and all the other derivatives. And this is why we started FerretDB as well. Before FerretDB, I worked for Percona, which is probably the most well-known open-source database consultancy firm. I learned a lot there. Actually, my co-founder, Peter Zaitsev is the founder of Percona as well. And we have some other Percona people at FerretDB. Then I went on to found Altinity, which was a much different company than it is today. And worked at Cloudera as well, a bit with Big Data and Hadoop. And then here I am at FerretDB now.

\n\n

Eric Anderson:
\nFerretDB, as I understand it, is Mongo on Postgres. How do you describe it?

\n\n

Peter Farkas:
\nI think that's a very good summary, a one-liner summary. We basically turn Postgres into MongoDB, a MongoDB compatible database. So how you can imagine that is if you have an existing Postgres infrastructure and you have a MongoDB application, MongoDB is no longer open-source. And with FerretDB you can use your existing Postgres solution to run your MongoDB applications as well. And it's not just Postgres, we also support SQLite, SAP HANA and other backends. So it's possible to turn other databases into a MongoDB compatible database as well.

\n\n

Eric Anderson:
\nI wasn't aware of that last part. And you already alluded to a couple reasons why you might be interested in doing this. One was MongoDB is no longer open-source. And the second was you may have an existing investment, you could interpret investment in many ways there. But an existing focus on either Postgres, SQLite or Hana, is that right?

\n\n

Peter Farkas:
\nSo after MongoDB went proprietary in 2018, MongoDB was adopted by a number of companies and even governments who had a policy that they could only use open-source software in their technology stack. And with the license change on MongoDB side, they found themselves in this impossible situation where they were already using MongoDB. And it was no longer open-source, but there was no alternative to it. And these users, these companies, these governments were looking to find alternatives and they were looking at Postgres. They were looking at some other solutions. But what they found is that there's no solution which would not require them to rewrite their entire application. And with FerretDB, you can skip all of that because you can just turn these relational databases into a MongoDB compatible database. And so we started with Postgres because we do believe in Postgres. We think this is where users are gravitating towards for a reason.
\nAnd we ended up supporting other database backends like SQLite and SAP HANA as well because there was demand from the community. Turns out that MongoDB is no longer able to serve use cases where there is an embedded application. For example, on a networking appliance, which uses MongoDB, but it's not practical to turn that into Postgres because of resource constraints. So that's why we decided to support SQLite as well. And now in some network appliances, as I mentioned, we are able to replace the last open-source version of MongoDB, which they still run because they still have to do that. Now, we can turn those appliances into fully open-source solution again with FerretDB and SQLite. And with SAP HANA, so SAP HANA is a very interesting example because SAP just decided to build compatibility into FerretDB. So they contributed as open-source contributors. And they're still building out the compatibility for SAP HANA into FerretDB, which is a great thing to see because that confirms our suspicion that there's a need for an open-source interface for many database backends.

\n\n

Eric Anderson:
\nIs there a community fork of MongoDB since the licensing change? We had the folks behind OpenTofu on the podcast a month or two ago. That was a licensing change followed by a very quick community fork that seemed to get enough critical mass. In some ways, FerretDB represents that community fork from Mongo?

\n\n

Peter Farkas:
\nWell, it represents the community's desire to have an alternative to MongoDB, but it's not a fork of MongoDB. So we are not using any of the code from the last MongoDB open-source release, simply because it would be a massive undertaking. Also, it would be pretty late. So we started FerretDB three years after the license changed. So much of that code is still. But it's an interesting example that you brought with OpenTofu. So the HashiCorp story, it created a much louder uproar in the open-source community. Partially because when MongoDB came out with the Server Side Public License, they stated that it's an open-source license. They even attempted the open-source initiative to certify the SSPL as an open-source license. So there was a large amount of confusion and there's a large amount of confusion even today, whether the SSPL is an open-source license or not.
\nSo I think MongoDB with introducing that confusion managed to avoid the forking of MongoDB because it was not clear whether the SSPL license is going to be regarded as an open-source license or not. With HashiCorp, this was much clearer from the get-go. It was clear that the community needed to do something and that an alternative is needed. Back then with MongoDB was not as clear. And by the time SSPL was really I guess considered not an open-source license, it was already late for the community to get the right amount of momentum to do a fork. That's just my private opinion on the matter. It's rather interesting how different the two events were.

\n\n

Eric Anderson:
\nI think it's a good opinion. HashiCorp is now the 10th or something notable project of late to do this. And we've had some practice on how to respond maybe to these as a community. Whereas with Mongo, I think it was like, "What's going on? What is this?" And maybe the ambiguity not only affected developers. But you've talked Peter, I believe at how even today some legal and large corporate entities feel like it's unclear how much liability they're exposed to operating Mongo. Tell us more about that.

\n\n

Peter Farkas:
\nThat's right. So we talked to large enterprise users on SSPL before and after founding FerretDB. We tried to understand where enterprise companies are in terms of their perception on SSPL. And the overwhelming feedback was that their legal teams are unsure where the boundaries are when it comes to what the SSPL allows and what the restrictions are. So the ambiguity of the SSPL license confuses large enterprises as well, which we believe, I mean, I don't think it's a resource problem on their side. It's more like yes, the license itself is so ambiguous that it's indeed hard to tell what is allowed and what is not allowed.

\n\n

Eric Anderson:
\nYeah. And so part of that's the language. There's certain lines in there that are just maybe ambiguous as to how they should be interpreted. And then two would be the amount of history on judges and cases clarifying or interpreting that language. I would imagine there's just not a lot of times that those things have been challenged.

\n\n

Peter Farkas:
\nYeah. And just to give an example here. So the SSPL license... And I'm not going to quote the legalese verbatim, I'm not a lawyer. But essentially, what it says is that you are allowed to provide MongoDB as a service if you added enough value on top of it. That it's fundamentally different from just a database which walks and talks like MongoDB. Now, how do you define the value there, the amount of value which would be enough for you to add to be able to run MongoDB as a service? What does providing MongoDB as a service really mean? This is not defined in the license and that's where most of the problem is.

\n\n

Eric Anderson:
\nSo three years after the license changed, you woke up and decided it was time something happened.

\n\n

Peter Farkas:
\nWell, it's a rather crazy story. We went to this epic adventure to the Himalayas to K2 base camp. And I think the idea of FerretDB was a result of the right amount of oxygen deprivation and cold, I guess. We talked a lot about MongoDB, taking the fact that MongoDB is one of the default databases one would use next to Postgres, next to MySQL and next to some other mainstream databases. But the only one which is not open-source. And that's rather weird because usually open-source databases are favored by users. Not just because of the need to avoid risks or license fees, but also because it's much easier to learn an open-source technology compared to a proprietary tech. So we talked about how MongoDB was still able to avoid being forked after all these years. And we tried to understand why. And FerretDB was started with the mission that this needs to be changed, that things need to go back to how they started, which is open-source.
\nAnd we also think that the word of document databases would need a similar open standard what SQL has. It's the same or very similar story as when IBM came up with the concept of the relational database. Then IBM came up with the concept of SQL, then it dominated that market for a decade until alternatives started popping up and SQL became an open standard. And we all see that today, SQL is the definition of commodity because it's everywhere. It's taken for granted that yes, if there's a database you can interface with it using SQL. But that is a result of work and vendors coming together and the creation of the open standard. And this needs to happen with document databases and particularly with MongoDB as well. So that is the mission of FerretDB. That's why we exist because we want to change the industry and expand the market the same way as how SQL did back then in the '80s and '90s.

\n\n

Eric Anderson:
\nIt wasn't clear that after the NoSQL enthusiasm that we would be back to being excited about Postgres and other SQL databases today.

\n\n

Peter Farkas:
\nI remember the NoSQL craze, and I think part of it was due to a big misunderstanding. I still worked at Percona. It was 2014 when MongoDB really started coming up on our radar. We were a MySQL company, so we had nothing to do with MongoDB. But I do remember that most voices were all about NoSQL is going to kill relational. And that's a huge misunderstanding because it's not about that. NoSQL is a good tool for many use cases. It makes certain things easier in certain situations, but there's no such thing as the one database which is good for everything. Not even Postgres. There's a good reason why there are things outside of Postgres. There is a good reason why there are many flavors of Postgres because they are all better at something which the user particularly cares about in that specific use case. So hearing that NoSQL or MongoDB is going to change everything by killing relational, I think that was a bit of a nonsense and a result of some misunderstanding.
\nWhat is happening today is that the two approaches are converging. So you see a lot of relational databases such as Postgres implementing document related capabilities. And at the same time document databases such as MongoDB started implementing or implemented SQL interface for BI workloads. And there's Yugabyte, for example, which provides database as a service which is capable of running document and relational workloads. And that is where NoSQL belongs right next to relational, right next to what we already had because there's a need for both.

\n\n

Eric Anderson:
\nAnd Ferret fits in that vision because I can have my SQL, my Postgres and then run Ferret right alongside it for my document DB use cases.

\n\n

Peter Farkas:
\nRight on. I just sold it to you.

\n\n

Eric Anderson:
\nDone. The history of your career that we talked about at the beginning, you described two different models. One was the consultancy and then I don't know if a product company is the right other one. But you mentioned how Percona was different and then Altinity started out a certain way and then kind of changed. Help us understand because I think in the world of open-source database companies, to an outsider, it might not be clear that Percona or Altinity or these other models exist.

\n\n

Peter Farkas:
\nThe easiest, not saying it's easy, but the easiest way to monetize open-source, first of all, why do you need to monetize open-source? Just to go back to the complete root of the problem here. The reason you want to monetize open-source is because at least my belief is that an open-source project has a much better chance to survive if there is a company behind at least some of it. A large contributor, which nurtures the open-source project and executes on a business strategy which provides the resources. Not just for the business itself, but for the open-source project as well to thrive and grow. I think that's a good thing.

\n\n

Eric Anderson:
\nSo commercial open-source isn't merely extractive. It's not just taxing the open-source system. It is actually additive in the sense that it brings life and energy to the open-source ecosystem.

\n\n

Peter Farkas:
\nI believe so. Because if you take a look at some examples where there were two or three open-source contributors keeping a project alive and suddenly there was a critical bug and there was no one around to fix it. And that resulted in losing trust in the project itself. That's a great example of how expecting everyone to work for free indefinitely and also provide 24/7 support for said technology and project is probably not realistic.

\n\n

Eric Anderson:
\nWe've had guests on here describing how even after his open-source project was successful, he wasn't sure of the end game. He was like, "There's no real way to hand this off to somebody else and everybody just expects me to maintain it. And I can't do this forever." So commercial open-source gives perpetuity to open-source. That's the first fundamental premise.

\n\n

Peter Farkas:
\nYeah, I think it's always confusing because open-source is regarded as something which can be used for free by everyone. But in reality, for a large user, let's take Apple. If you pick up your iPhone and go to the legal section in the about menu, you will see that your iPhone or iOS is based on 100 different open-source projects and open standards. And someone needs to maintain those. And probably Apple needs assurances that that technology is going to exist even a couple of years later as well. Which brings up the question, how can you increase the level of trust in your open-source project? And that is through ironically monetizing through providing services for it. So if you provide services for your open-source project with your company, then you can provide the necessary amount of assurance to your users that they will have someone who is going to be able to come and fix if something happens. If there's a bug or a missing feature or simply just ongoing maintenance of the code.
\nAnd this is what companies like Percona or Databricks or Cloudera or others recognized. If you take Cloudera, they are a company built on Hadoop, which is another free and open-source technology. But most of the users of Hadoop would not be able to take the risk of just using Hadoop without a company like Cloudera, which provides 24/7 support for said software. So it's mutually beneficial for everyone involved. And then there are the hobbies and the smaller users who also benefit from this relationship because they get a strong open-source project as a result, which stands on firm foundation.

\n\n

Eric Anderson:
\nAnd in that context, where does FerretDB fit, Peter?

\n\n

Peter Farkas:
\nSo FerretDB was not started with having a business in mind. We wanted to solve a problem. The problem was, "Hey, what should we do with this situation where we care about databases, we care about open-source?" And most users using document databases still believe that MongoDB is open-source and still use a proprietary software probably without even knowing it. So we started FerretDB with the intention that we are going to disrupt the current situation where MongoDB is the only company which can provide and can develop MongoDB itself. We were pretty successful with catching the attention of the community. We were pretty successful with making people interested in the problem. And what we need to do now is we need to step up as a company as well, which provides services and as a service solutions to make sure that the project itself is going to be sustainable.
\nSo that's where we are now. What is more important is there are other alternatives of MongoDB. AWS is DocumentDB or Microsoft's Azure, Cosmos DB for MongoDB. We actually working really hard on bringing all of the alternatives together to work on creating an open standard. To make sure that these products will not be merely alternatives to MongoDB. That they will be MongoDB compatible but not as alternatives for MongoDB, but as implementations of the open standard or the eventual open standard. Meaning that MongoDB alternatives will not have to run after MongoDB or be driven by MongoDB Inc's priorities. That's what we are working on.

\n\n

Eric Anderson:
\nThat's curious. I was at Google working on BigQuery and related Big Data SQL things at one point. And originally, the BigQuery was not ANSI standard SQL. It was a variance which suited the kind of workloads we expected in BigQuery. But that was always a request of the community, of users. And over time Google has now supported some kind of ANSI standard of SQL. And so you would like to work with Microsoft and AWS and others and agree on what does Mongo compatible mean? And compatible may not even be the word in the future. Maybe like OSI standard, ANSI standard, some kind of standard.

\n\n

Peter Farkas:
\nExactly, exactly.

\n\n

Eric Anderson:
\nAnd maybe even the MongoDB company variance may or may not live up to that standard.

\n\n

Peter Farkas:
\nExactly. This is our big goal. This is our desire. And this is something we put a lot of effort in to make sure that this cooperation is going to be a successful one because this is the key to innovation. This is the key to competition, healthy competition on this market. Right now, there's no competition whatsoever because the solutions MongoDB itself and its alternatives are not compatible with each other. So you can't just look at your MongoDB Atlas invoice, be unhappy with the cost and go elsewhere. You can't. You're logged in. There's no opportunity for you to remedy that situation without having to touch your application. Of course, you may be lucky, you may be able to migrate. But as soon as you used even one of the advanced MongoDB features, you're locked into MongoDB Atlas. This is not what open-source is about.
\nSo MongoDB still calls itself open-source in many of its documentation and marketing materials. But open-source is about getting rid of vendor lock-in, about giving choice to the user. And we want to make that happen by creating the open standard and by collaborating not just with the big cloud providers, but we are hoping to collaborate with MongoDB as well. They are also needed in this discussion. It's their interest as well.

\n\n

Eric Anderson:
\nAnd maybe I can develop the value of a standard further because I think there's a void lock-in and there's some economic reasons. But there's all this whole tooling world where you support SQL suddenly there's like... In the world of at least Big Data and there's visualization solutions that are all SQL centric. There's code writing, clients. As long as you stick to a standard, there's a whole plethora of interoperability that emerges. And you're saying we could bring that to the document universe?

\n\n

Peter Farkas:
\nYeah. It does not exist today. And with the open standard and with the collaboration between all of these different alternatives and MongoDB itself. Hopefully, we are looking at a massive expansion of opportunities and increase in interoperability and basically the same thing what we have with SQL. But just to depart a little bit from the world of databases, there is a very simple example on how an open standard can help and expand a given technology. Adobe PDF, we take it so much for granted that if you go to your airline and you want to get your boarding pass, you just click on the download button, you get a PDF or anywhere else with any vendor across many different unrelated software. There are things like PDF, which are common. And you can just print it, you can export it, you can edit it. No one used PDF before 2006 when it was not an open standard.
\nIt was actually a technology with a dwindling popularity, it was non-existent. And only when vendors came together and Adobe also allowed that to happen to a certain degree, PDF became this universal tool. And the use of the technology itself skyrocketed. And it's just unbelievable how popular it is today. Same with if we go back to databases again, SQL. The moment it became an open standard, MySQL, Postgres, and all the others could implement the open standard. And we just take it so much for granted, and there's such a vibrant and amazing amount of innovation in that area. And we believe that the same exact thing is going to happen with document databases if we succeed and we will succeed with creating an open standard out of it. That's the essence of our vision.

\n\n

Eric Anderson:
\nThat's awesome. And history has shown you can have some success. You've got 8,000 GitHub stars in this big growing community. How did that come about? Any tips you can give for us, Peter, if I were to start an open-source project to make it as successful as yours?

\n\n

Peter Farkas:
\nWell, I don't want to make it look like that all the success we had is a result of precisely calculated set of actions. But if I want to reflect on the success we had so far, I think the most important is to address a real problem. And for that, you probably need to either sell your vision even if you don't have your product yet. In our case, we created a tech demo and explained why the world needs FerretDB. And we got a massive amount of positive response on that. We also got a lot of feedback, we needed to address that. And as soon as the community saw that we are a team which works with the community and which listens to the feedback and addresses the questions, they were more and more likely to work with us. And I think that is what we did in order to get this amount of attention from the community.
\nAnd it's not just about stars. What we are most proud of is contributions. So I think that community contributions are a lot more important than stars. I think that's the real tool or real metric you can use to measure your success and whether you are doing the right things. Because as soon as you see external contributions, that's when you can be sure that someone is trying to scratch their own itch by improving your code. And that is the sign of real interest. It's very easy to star a project. It takes no commitment whatsoever. But contributing, that's a whole other level, and that's what we are really, really taking seriously in terms of a metric.

\n\n

Eric Anderson:
\nWhat's the state of the project today? You've described that you support Postgres and SQLite. This HANA project is happening. As people dive in, what should they expect out of FerretDB today and what should they expect in the future? What are you working towards?

\n\n

Peter Farkas:
\nYeah, so we've been working on FerretDB for two years now. The expectations towards FerretDB being a MongoDB alternative, most users expect that it is going to be the exact same thing. Unfortunately, that's not the case. So MongoDB has a lot of advanced features, a lot of features, which none of the alternatives implemented. Simply because we have this saying that 85% of MongoDB workloads use maybe 25% of MongoDB features. So that's what we are aiming for, to provide these core set of features, which most MongoDB users can utilize to migrate away from MongoDB Atlas if they want to. On the other hand, we are also in need to address performance, the question of performance. So right now, we are about half as performant in most cases compared to MongoDB Atlas. This is true to other MongoDB alternatives as well. And while most use cases are not really affected by this, this difference is not sustainable.
\nSo we want to create our own Postgres extension, which addresses some of these performance issues. But we also need to introduce other tweaks as well to get to where we want in terms of performance. So all in all, I can't say that we would set expectations, right? If we would say that "Hey, whatever workload you have, just migrate to FerretDB because you're going to have the same experience." That's far from reality. But we are onboarding more and more users and their use cases. And we are developing FerretDB along the way.

\n\n

Eric Anderson:
\nThe FerretDB projects actually even got more going for it than I realized coming into the show. This idea of a document standard is really interesting. And you've convinced me that your efforts here not only can build an interesting business but can really help advance the open-source community. So thank you for what you're doing for all of us. It's a gift to humanity.

\n\n

Peter Farkas:
\nWell, that's probably an exaggeration. But we'd like to think that we are changing the database space for the better by re-enabling the users of document databases to have a choice. I think that's what we do. It's far from revolutionizing healthcare or AI. But as I said earlier, I like the database space because it serves as the foundation of amazing tech such as AI, such as anything else you can think of. And it would make me proud if we could see five years from now that FerretDB disrupted this space in a way that users became better off than they were with MongoDB. That would make me very happy, and that's what we are marching towards.

\n\n

Eric Anderson:
\nYou can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.

","summary":"","date_published":"2024-01-31T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/8eeb1575-43d0-4d3c-8d89-553c9ccbd8b4.mp3","mime_type":"audio/mpeg","size_in_bytes":32638477,"duration_in_seconds":2035}]},{"id":"f1ab48ba-9845-4a76-8936-c8b64a5640cc","title":"The Duke of SQLite: Litestream with Ben Johnson","url":"https://www.contributor.fyi/litestream","content_text":"Ben Johnson (@benbjohnson) is the creator of Litestream and LiteFS, two open-source disaster recovery solution for SQLite. Litestream is designed to provide continuous backups for SQLite databases by streaming incremental changes, allowing for easy data recovery in the event of a server crash. LiteFS, on the other hand, is built on LiteStream but uses transactional control to focus on replication and high availability. Join us as Ben discusses the challenges and trade-offs of open source contributions and the future of databases.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n The history of how Ben got involved in SQLite development out of “spite”\n\n How Litestream “works on a fluke”\n\n Different use cases for Litestream vs LiteFS\n\n Why fully open contributions isn’t always Ben’s style\n\n The greater server-side SQLite landscape\n\n\n\nLinks:\n\n\n Litestream\n\n LiteFS\n\n Fly.io\n\n BoltDB \n\n\n\nPeople mentioned:\n\n\n Philip O’Toole (@general_order24)\n\n\n\nOther episodes:\n\n\n The Social Miracle: rqlite with Philip O’Toole\n\n The Big Fork: libSQL with Glauber Costa\n\n\n\nBen Johnson:\nI got in the habit of just releasing stuff. I talk about it, put it out there. I like the open source side where it is just a community of people trying weird stuff, seeing what sticks.\n\nEric Anderson:\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Welcome Ben Johnson to the show. We are excited to continue our SQLite frenzy with, I don't know, the King of SQLite, but certainly one of the royal family here. We did just record rqlite and then we did Turso a month ago or more.\n\nBen Johnson:\nIt'll just be a SQLite podcast going forward.\n\nEric Anderson:\nI don't know whether to call this a Litestream episode or a LiteFS episode, but it is a SQLite episode. Usually I have you start by telling us what the projects are, an elevator pitch, so to speak, to ground the conversation. You could start with one or the other or however you want to do that.\n\nBen Johnson:\nSure. Litestream probably makes the most sense. That's the first one I started on. So Litestream, the idea of it is it's meant to be disaster recovery essentially for SQLite. So usually when you run SQLite, it just runs in a single box and you could do backups every hour, but really what you want is you don't want to lose almost any data.\nSo what Litestream does is it takes advantage of the fact that S3 is super cheap to upload to, but expensive to download from, and it just has a streaming replication every second to S3 of incremental changes. Then if your server crashes or just blows up, then you can actually just download up to a second or two from your disaster essentially and recover all your data. That's the elevator pitch for Litestream.\n\nEric Anderson:\nThere's a million ways we could take that intro, but just to noodle on one thing, SQLite, I think for a lot of people they imagine it in the browser or on a local device. Is that the typical place where people are backing up their SQLite from, or is this meant for a server side SQLite?\n\nBen Johnson:\nYeah, I really want to make server side SQLite work. So my background, the last maybe decade or so was just in databases. I run another one, a database called BoltDB. It's a pretty popular Go keyvalue store. I really tried to make an application stack essentially on Bolt to see if I could make a key value store work for a main storage. It worked in a lot of ways, but you really start to miss schemas and indexes and those nice little things you get from SQL databases.\nSo the next step is looking at what is out there for a SQL based database, and really SQLite is the main one out there, just runs on everything and is rock solid. So I transitioned over to that. Actually for the Bolt stack, I tried to make it Bolt, Go and I think it was React I think had just come out at the time, or it was recent. So I was going to make the Burger Stack, which I thought was a little play on words, BGR. Anyway, it never took off, but the marketing was on point.\n\nEric Anderson:\nYeah, I don't know what makes a stack acronym take off because there's been a lot of good ones of late that haven't quite found escape velocity. So that makes sense for why you want something that's not a key value store, so you want SQLite. But maybe now do the opposite comparison. Why do service side SQLite and not Postgres or MySQL?\n\nBen Johnson:\nYeah, by all means. I use Postgres and MySQL, and I used to be an Oracle DBA way back in the day, and I have nothing against the client server model. I think it works in a lot of instances and there's pros to it and cons, but once you start developing against a local SQLite database, it is just wicked fast to do. You don't have the concept of N+1 query so much. You might have one SQL query that'll pull a list of orders and then for each order you need to do a separate query for each of your items, and you start to explode the number of queries and just the latency between the server and the client just blows up.\nYou have to keep doing all these round trips. And when you have SQLite, everything is just right next to it in process and it's just orders of magnitude faster for those for optimizing that kind of thing. And I feel like we're getting to that place where we have these servers that are just super beefy and cheap and have fast storage and the speed of light is like your limiting factor when you're going between your application server and your database. So that's the ideal fit.\n\nEric Anderson:\nSuper. And I'd like to get into your background and the background of the project just because we had talked about these prior episodes, maybe it's worth flashing that apparently you've worked with Philip not only in SQLite open source land, but you're in the same company together and you've already alluded the fact that you've got a database background. So take us through the small world that we live in.\n\nBen Johnson:\nSure, no problem.\n\nEric Anderson:\nAnd why are you working on this?\n\nBen Johnson:\nSure. Yeah, it's weird. I actually started as an Oracle DBA maybe 20 years ago, and then I've done Pearl, I've done JavaScript applications. I've done the gamut of everything. And about 10 years ago I was working at this company that did behavioral analysis, so basically had to log data and all kinds of interesting stuff from big companies which took and tried to ingest all their logs. And they had this really terribly slow way of doing it, where they'd ingest it into a SQL server database and then they'd run queries and it took a week to process data and what was that map reduced thing that came out?\nHadoop had recently come out and I was like, \"Hey, you guys should try Hadoop.\" And I was the data visualization guy at the time, I didn't even do databases. I was like, \"I think this would be way faster.\" And they're like, \"No, we're not going to do that.\" So after I left that company just had this little tickle in my head where just out of spite, I wanted to see if I could make it faster, what they were doing. So then you just go down this rabbit hole and when you start making things faster, there's no end to where you go.\nSo I started with, I was doing some Redis stuff and then I was like, \"Ah, it's not fast enough.\" So I'd go a little further down and I'd start writing... You just slowly start writing your own database, you don't even need to. So I actually ended up writing this proof of concept of a behavioral analytics database, even had its own query language and parser and all kinds of stuff, but would actually write to disc and then query stuff super fast.\n\nEric Anderson:\nIt's interesting, the life decisions we make. That this one little curiosity set the path for the career in a sense.\n\nBen Johnson:\nIt's like spike driven development. So yeah, I started going down that road and it was really just a side project at first. And the actual funny thing about that was that I was doing a talk at a local database distributed systems meetup, and I gave this example of, \"Hey, if you're, say Shopify,\" and they were a new company at the time, I was like, \"Hey, you could analyze how people flow through and where they drop off.\" It was funnel analysis early on back then, and you could see where people would drop off at different stages and what they would do instead.\nAnd it was an interesting way to visualize all this stuff. And two weeks later I get an email from somebody at Shopify. They're like, \"Hey, I saw this video of yours.\" And I ended up going out there and I ended up working with them for about a year and a half until they IPO'ed. And then yeah, I switched over to working with Influx Data for a bit. That's where I met Philip O'Toole who does the SQL distributed SQLite implementation.\nAnd yeah, we actually don't even discuss the SQLite stuff much until recently. We're just on both our own divergent paths and just met in the middle. But yeah, he's a great guy. I like all the stuff he's doing. Definitely a different approach, but I like that. Everybody has their own take on how to do this really specific weird problem of distributed SQLite. I think it's healthy for the ecosystem.\n\nEric Anderson:\nThere's got to be a part of you that gets a little bit of buzz out of like, \"Oh, other people think this is interesting and want to solve it.\" But...\n\nBen Johnson:\nFor sure. I think the worst thing is being the only project in the space. Does anyone care about this problem besides me? So it's great to see competition. And honestly, my end goal would be to make software development easier for people in whatever way that looks. And if someone else comes by and they have some solution that's 10 times better than Litestream or LiteFS, more power to them, I'll switch over and use it, that'd be great. Somebody else solve this problem, where we don't have to spin up a database server and all these other things, we can just deploy out code and not worry about it.\n\nEric Anderson:\nSo helpful background. You're clearly curious about something, it led to a distributed data processing career and particularly maybe something about SQLite databases, but I don't know that we've gotten to Litestream quite yet. Where are you the day of the first commit thing?\n\nBen Johnson:\nOh, sure. So Litestream, there was more than I knew. SQLite seemed like the way to go. If you wanted a fast embedded database, I hate all the complexity going on with all the stacks I get to set up of Postgres and then maybe need some caching using Redis or whatever in your stack, MIM cache. And just all the different components and having to set them up, and one breaks and just debugging all that's such a nightmare that for the majority of small to medium-sized projects, I feel like they can get by on SQLite running on a half decent sized machine somewhere out there. So I wanted to figure out that problem.\nIt seemed really interesting to me and I realized that SQLite was the thing I wanted figure out. That seemed to be the main tool, but actually figuring out how to hook into that was the big problem. They don't really give you a great API and I wanted to make it something where you don't have to custom compile anything. You just plop this thing in and it just works. There's minimal configuration. I didn't want the application to actually know about the underlying ops side of it where it could be running against a regular SQLite database or it could be running against this thing that's also uploading to S3.\nYou should know that from the application side. So it was really more just a bunch of iterations. And even for a while I was looking at rewriting SQLite and Go just more to understand how SQLite works. I tend to rewrite projects to understand them. And so I was doing that for a bit and just one day reading through some of the docs. Litestream, it works on a fluke almost. So the way that SQLite works is that it does this thing called write ahead log or there's a mode that it has where it'll essentially write all your changes to a separate file called the write ahead log, and then it just depends onto that file over and over again.\nEach new page it writes, it depends onto there, and then eventually it gets too big and it needs to do a thing called checkpointing, which is where it takes all those changes and then copies them back over to the main database and it restarts that log. But the thing is with SQLite, it can't actually do that checkpointing process until it doesn't have any transactions going on. So Litestream essentially hooks on and does a read only transaction, so it's like a long living read only transaction.\nSo we can look at the data behind the scenes as it's getting written in, and there's some checks and things to prevent it from missing data in there. So it doesn't actually use an approved SQLite API, but it goes through SQLite channels and all that stuff, the way you're supposed to use SQLite.\n\nEric Anderson:\nYou found a programming interface, just not the one they intended you to use.\n\nBen Johnson:\nIt's not official necessarily, but just how their logs work and all that. It's just stumbled upon it one day and I tried it out and it worked. It was great.\n\nEric Anderson:\nSo SQLite's famous for being a single file. My database is just a file. And so this write ahead log is a second file that's maybe hidden somewhere that's temporary.\n\nBen Johnson:\nYeah, essentially. And it's a little bit of a misnomer. There's actually four different files you can have SQLite database, but yeah, essentially you can think of it as a single file.\n\nEric Anderson:\nSo you create Litestream. Did you have an ambition? You were excited about making this thing real.\n\nBen Johnson:\nI didn't really think anybody was going to take it seriously.\n\nEric Anderson:\nSo how do you go about launching an open source project? Was this your first and what does launching SQLite look like to you?\n\nBen Johnson:\nNo, it's not my first. I had some decent success with the BoltDB.\n\nEric Anderson:\nThat's right, I'm sorry.\n\nBen Johnson:\nYeah, that one got pretty popular and then I eventually archived that one and SED who was Core OS at the time before they got acquired by Linux, or no by Red Hat, I think they eventually took it over. But yeah, I think I've gone up and down through a couple different projects. I feel like I've started a ton of projects and a couple of them worked. So you can find it's just a graveyard of repos on my GitHub and a couple that actually worked out. So I think it's, I got in the habit of just releasing stuff. I talk about it, put it out there, and I've always just had an interest in trying things out.\nI like the open source side where it's just a community of people trying weird stuff, seeing what sticks. There's a thing called symbolic execution in computer science where it's used for testing or you can use it for a lot of things. You can actually write out a program and then you can actually generate test cases for it based on... You basically make these math equations for each branch as it goes through your program and you feed it into what's called an SMT solver. And then they can spit out inputs that would solve all the different branches and they were super weird nerdy projects.\nAnd I spent probably six months doing a port of one of those, it's a C program called [inaudible 00:13:19]. I ported over to Go and used Go's SSA format. I spent tons of time on it and then it just went nowhere. No one cared. Probably the most advanced project I've worked on, and it just fell with a thud. So you really never know honestly. I think the more accessible the project is, I think that helps a lot. People love SQLite, so in hindsight, I think that was a natural thing. People want to see things that they already love get additional support.\n\nEric Anderson:\nAnd was Litestream ever to be a startup? Eventually you made your way into Fly, I think in relation maybe to your Litestream work.\n\nBen Johnson:\nSure. Actually Fly actually purchased Litestream as a project, which is unique. And then I came on as well with that obviously. And it wasn't ever meant to be a startup or anything. I think I had some thoughts of I wanted to create a service where people could make it easier for people just to be able to continuously stream back up or use SQLite's in some way. But I didn't envision it as the next billion-dollar startup thing. I've tried doing startups and things in the past and they fail miserably. They never get anywhere off the ground. So I've realized that's not my forte.\nAnd so yeah, there's no big ambitions, but I think with Fly, they'll want to make things where people can run their application easily across multiple regions and whatnot, and just make things really fast for end users. And that was what I liked about SQLite and where I was going after Litestream, I was trying to make some replication on Litestream itself, and that worked, but it eventually got forked off into a separate project. We needed to rework a lot of things to make it work, and that's what ended up being LiteFS essentially.\n\nEric Anderson:\nSo Fly could have forked Litestream presumably, but then that would just confuse the world, like why and what. So by buying the project, they get your endorsement and you get to keep working on it. And there's one community and everybody's happy\n\nBen Johnson:\nYeah, and they do a great job. Honestly, one of the reasons I came on is because they do support a lot of open source creators and they're not even very flashy about it honestly, but there's a lot of projects out there that they're the highest donors too. So I like that side of it. But yeah, I don't think they had an interest in taking it over or trying to be some big name around it. I think they just saw it as helping to enable their users to run stuff better. So I think that was more the ethos. And honestly, unless you have experience and doing database C stuff, it's not much fun to be on this really low level piece.\nWhat we wanted to do with the LiteFS was, so Litestream hooked in through the regular SQLite transactional hooks-ish sort of. And then LiteFS takes a different approach where we really wanted to make it feel like you're writing just a local Litestream, but you wanted to be able to have those rights automatically replicated instantly to your other nodes and have that all work seamlessly. And you have a lot of things around ensuring consistency. You don't want a separate process to die and then come back and then it gets inconsistent from where the data's at.\nSo we actually built it as a user land file system, which is a weird approach, but it can essentially intercept rights and basically check them, see where a transaction begins and ends, and it can package those rights into a file that we then shipped to the different replicas in real time. So it's much more strict in terms of consistency of the actual file contents, whatnot. Litestream did what it could with the API it had, but we really just wanted a little more control around it. So that's where LiteFS ended up going.\n\nEric Anderson:\nShould LiteFS be then thought of as a next generation?\n\nBen Johnson:\nI think it's really more like a separate thing. Yeah, Litestream is really great if you just have a single node and you want disaster recovery. That's really the aim of that. And then the two most requested things and Litestream were people wanted to do replicas. So essentially they had a primary and then writes immediately go out to other nodes and then they could query off of those as well.\nAnd then the other one was having failover. So if the primary dies it fails over automatically to another node, and that was the goal with LiteFS. So it has a few more moving parts than Litestream, Litestream is pretty dead simple. And then LiteFS, it has some hooks around being able to switch what the primary is and then hook into the other replicas. So it's really more of a replication and high availability tool than Litestream is.\n\nEric Anderson:\nAnd I shouldn't go here, too much of a noob, but I'm going to. The way you described this other API, that LiteFS is using, felt reminiscent of people talking about using the Postgres wire protocol. This isn't a wire protocol that you're?\n\nBen Johnson:\nNo, it's not a wire protocol. Although I just make a project that actually you can interface with SQLite databases over the Postgres wire protocol.\n\nEric Anderson:\nOh gosh.\n\nBen Johnson:\nIt sort of works. That's pretty hacky. Don't use that.\n\nEric Anderson:\nWhat's the state of LiteFS today? It's a database you can go run on Fly?\n\nBen Johnson:\nWe run it internally in production and essentially you set it up and it's great if you need... A lot of times your biggest overhead in terms of latency is just geographic. So if you have people that are around the world, especially a latency from US, I don't want to say here to Europe, but the US to Europe is usually about a hundred milliseconds is the ballpark. And from the US to Asia around a quarter of a second, 250 milliseconds.\nSo if you can actually place data over there that's right next to your application, you can just get so much faster response times from your web apps. So that was the impetus of what we wanted to do and we're trying to make it as dead simple as possible. So there's not a lot of thinking about, you don't need to be a distributed systems expert to run this thing. That tends to be a problem in the distributed systems world. It's like everyone needs a PhD to run anything.\n\nEric Anderson:\nSo I mentioned to you, we had Turso on a couple months back, they were describing libSQL mostly from the open source angle. And I should ask you about SQL and the SQLite world later, but they also do some edge SQLite things. Could you help us understand the differences or similarities between the two approaches?\n\nBen Johnson:\nSure, no problem. This is no knock towards Turso. I think there's trade-offs for both. And honestly, sometimes I feel like people in the SQLite community should make some drama. I don't know, but we're all friends. Everyone like, \"Oh, that's cool,\" and people borrow from each other and it's fun. But yeah, Turso, the way they've done it is they're still essentially a client server tool where you're connecting up to another server. It's usually at an edge location. It's not in process, like LiteFS would be. They do have an option where you can actually embed replicas, like read only replicas locally with your application.\nThat's a newer feature I believe. But yeah, they've taken a different approach in how they hook into SQLite. They actually just forked off the project and they made a couple changes around the write ahead log and how they can hook into it in some different application hooks. So really it's not a pure SQLite approach. And that's fine. It depends on what's important. I like to reduce the amount of friction people go through. If you already have SQLite running, you don't have to install different libraries or do any tweaks around that. You just plug in this thing and it should hopefully work.\nBut again, it's a different approach. I think there's definitely benefits. It's a managed service. We can't really do a managed service with LiteFS because they're really running in the same process more or less as the application itself. So a managed service doesn't make a huge amount of sense, whereas with them, they can connect up, they do the management of the actual servers themselves. So there's some ease of use and ease of maintenance that they get out of that.\n\nEric Anderson:\nYeah, I think you're close to clarifying. You answered questions that I didn't realize I had with that. So LiteFS is not a managed service because you end up just being SQLite files on local disk.\n\nBen Johnson:\nIt's essentially like a file system that lives on the same node as your application or on the same server.\n\nEric Anderson:\nSo I have an app server and instead of having my database over the network elsewhere, there's just files on my app server.\n\nBen Johnson:\nFiles on your app server. And then it has a little staging area where you can hold almost like diffs of your... Every time you do a transaction, you're getting this transactional file and that can get shipped out to the other replicas, but sometimes other replicas can lag behind. They can get disconnected. So when they come back up, you want to have some most recent set of those transactions. So there's that staging area for those as well, if that makes sense. Maybe that's getting too far in the weeds for database replication.\n\nEric Anderson:\nYeah, that's fine. In the edge world, there's some of these serverless app servers where they just get spun up. I'm speaking the Fly language a little bit, but poorly. Quickly process your requests and then get spun down. I believe that's how it works to some degree. So in there, my database then is just active for a moment. The files are there on that same little execution node?\n\nBen Johnson:\nYeah, files are there. Right now LiteFS works where it needs to have at least the primary nodes, those running all the time. They don't do well for auto stopping, which you can have for replicas elsewhere where they can come up and they can pull in the latest data and stay up for a little bit and then shut down when they're not used.\nAnd we started doing some work on what's called a virtual file system, which is a concept inside SQLite where they actually abstract out the file system because they run on Windows and they run on Unix, and there's a layer in there. We've actually built a version that works with our... We have a managed service for just disaster recovery stuff with LiteFS called LiteFS cloud. It'll actually work with that and it'll pull down pages and it's transactionally aware within there. So you can run it on something like [inaudible 00:22:49], but it's still alpha beta.\n\nEric Anderson:\nSo I paused this on the SQLite open source libSQL, but maybe just SQLite is apparently a unique project in terms of outside contributions.\n\nBen Johnson:\nThey don't allow it.\n\nEric Anderson:\nWhat are your thoughts on that as someone who probably considered... There was probably a point where you're like, \"If I could just make a contribution, this would be a lot easier.\"\n\nBen Johnson:\nI can see where they're coming from. I disallowed contributions. I've done that on a lot of projects personally. I think my contribution policy got on Hacker News one time. And I was expecting people just to dump on it, but everyone was like, \"Oh, I can understand that.\"\n\nEric Anderson:\nMakes sense.\n\nBen Johnson:\nIt's just draining after a while sometimes. Especially with the projects I do, I really like having an idea of what the end goal is. With Bolt, I get a lot of shit for, I essentially called it done at one point where I wasn't going to make any more feature changes. It was this is it and we'll do bug fixes and whatnot. But I really wanted a simple project where you don't have to worry about constantly learning what's being added or different things.\nAnd those features coming in might bring in bugs for other things. I wanted a staple and I had an idea of what the end goal was. So when you have outside contributions, people always want to add more stuff. No one's coming in trying to add more test cases or something. It's always some really big feature that like, \"Oh, I think this web server belongs in your key value store,\" or something like... It's maybe not that crazy, but trying to constrain the vision while accepting outside contributions is tough.\n\nEric Anderson:\nYou had this contribution policy that was basically like, \"Don't try,\" and it was on Hacker news.\n\nBen Johnson:\nYeah, it was like F off. And I think people understand for the most part. Some people get a little grumpy about it, but by and large, I think people understand and appreciate that.\n\nEric Anderson:\nSo libSQL, is that... You answered this earlier, that since you've been able to make Litestream work with SQLite, then it's nice to be able to not have to switch libraries.\n\nBen Johnson:\nThere are certainly things I'd like to probably tweak in SQLite and I put that up there to see if they'd be open to changing those things. But I think by and large, I can think it works well. I like the actual, been dictator for life model of open source. It tends to be like one person or two people are the main contributors to most open source projects and just they have a sense of where the thing is going or where it should go and what are the constraints, other projects in their head.\nAnd trying to open that up to just everyone. I think there's pros and cons for sure. I just think it's a lot to put on people, especially for projects where you're not necessarily making money off of it or it's not your full-time job. It's just like this fun thing you like to work on, and then you just keep having people come in and trying to change it around. So it's more power to them if they're opening it up by all means. It's more of a mental health thing I think, for me.\n\nEric Anderson:\nSo what's the future for databases? I think a lot of people are building apps that are more or less the same and they reach for the database they're familiar with. Might they benefit from just starting with LiteFS? Is that the right use case for IFS? Just most new crud projects.\n\nBen Johnson:\nHonestly, I would even step back a little bit. I think you can just start with SQLite honestly, you don't have to use my stuff at first. I think if you start growing and you realize, \"Oh, I want to have continuous backups...\" On our Litestream page, we actually have alternatives like, \"Hey, here's how you can just set up hourly chron backups. You don't even have to use Litestream.\"\n\nEric Anderson:\nBefore you go on further. So I should just have SQLite server side sitting on my app server as we talked about. And then once I'm like, \"Oh, I feel a little uncomfortable, I should probably have backups.\" Then I can Litestream it to get backups and then... Sorry, pick up where you were.\n\nBen Johnson:\nOh yeah, Litestream it as you say. I haven't used it as a verb, but that's good. But you can continuously back up your data up to S3 and it has some nice protections around it. And then if you continue to grow or maybe you have users in a different side of the world and you're like, \"Hey, this is slow over here, it would be great if I could just have a replica and replicate my data.\"\nYou can use LiteFS. I think at each stage I really wanted to make it, you don't have to jump in and commit to this new project from day one. It's really more like an evolution of like, you have SQLite, \"Oh, here, you can tack this thing on when you're ready for it,\" and then you can tack this thing on and when you're ready for it. And just slowly move towards that end goal.\n\nEric Anderson:\nAnother enthusiasm for SQLite is around Local First, and I want a SQLite instance on my device, and then maybe it can talk to a SQLite in the cloud. And now I've got the benefits of Local First and a service side database. Is that a Ben Johnson use case?\n\nBen Johnson:\nNo, it's not. So there's two separate worlds. So LiteFS and Litestream both do what are called physical replication. It basically copies the exact pages, like the bytes in the pages across the network to somewhere else, and they can recreate your original databases through change sets. And if you have the Local First stuff. So CR SQLite is one that's pretty popular where it's actually CRDTs over SQLite, CRDT is conflict... Or no, I always-\n\nEric Anderson:\nResolution.\n\nBen Johnson:\nYeah, data type something. Anyway-\n\nEric Anderson:\nI think so.\n\nBen Johnson:\nIt's the worst marketing name in history, but it's a way to have people changing the data in two different locations and have it sync and you can figure out how they merge together. So that's really a separate world of SQLite. I think SQLite is probably the most popular as far as I know. I know there's another one called [inaudible 00:28:22].\nI don't know if they're still doing it. But yeah, so there's been different approaches to that. And I think Local First makes sense for certain projects, but I think there's a lot of mental... You really need to understand how conflicts work and how all that stuff works and whether it's actually beneficial for your use case. I think there's a lot of overhead to it.\n\nEric Anderson:\nYeah, it feels like for most use cases, most applications, it's a nice to have feature and the implementation is quite a lift.\n\nBen Johnson:\nYes. I would agree with that.\n\nEric Anderson:\nAnd most people are like, \"You know what? Maybe I don't need to have that actually.\" But for some apps that's a critical thing and they make it work. I don't know.\n\nBen Johnson:\nAnd again, more power to them. Eventually consistent stuff generally, most of the time it's just hard to do. Early's hard to do well. You get a lot of edge cases and weird bugs. That's my warning to people. Anything eventually consistent.\n\nEric Anderson:\nI think we've covered most of what I wanted to cover. Ben, anything that you feel like could be interesting that we haven't covered?\n\nBen Johnson:\nI think there's a lot still around SQLite usability on the service side, which I think can be improved. I love to see people work on that. I started a little hack of... I saw somebody else make a hack of this, and I made my own hack of this, of connecting to SQLite over SSH, so there's no server running on the server node.\nBut when you connect over SSH, you can run a program over there on the other side and then communicate over standard and standard out with it. So it's essentially connecting out to your SQLite program on your server and then doing queries against that. But again, there's no official API to doing that within SQLite, like the SQLite's CLI. So I'd love to see more around that.\n\nEric Anderson:\nYou're saying there's an admin interface, like I can communicate with my database through the app server, but if I want to configure it or do some other admin type things.\n\nBen Johnson:\nOr if you just want to do some backend queries, I think that'd be great to see. But yeah, I think there's mainly usability honestly is, I think the biggest hurdle. I use a lot of CLI tools, so I don't mind SSH being into another node, but some people dislike that, so I think that can be a hurdle for sure. So if there's more gooey, more usable interfaces for more people, I think that would help a lot.\n\nEric Anderson:\nWhat is the community like around server site SQLite? Do you find that there's a bunch of people who are kind of, \"I've moved on from my SQL and Postgres. I just do, just as we described, I start with my server light SQLite and then I start doing backups, and then I do this conflict resolution,\" or whatever it is, and they've moved on. They don't need Postgres or MySQL anymore?\n\nBen Johnson:\nI don't think they're mutually exclusive by any means. And honestly, when you look at companies or businesses, you have your application like the main one, and then you have 20 or 30 of these side ancillary applications of things. This thing runs and does whatever, has this little UI on the side. It's not part of your main application that has to be up and running all the time. And those can tolerate, if you have to restart the server, you don't mind people not connecting to it for a second or two. So I think there's a lot of different use cases out there.\nAnd I think SQLite can be great in a lot of different ones, but I think as far as starting out, I think a lot of people feel more comfortable using SQLite. And honestly, for a long time it was just like the toy database that people made fun of you for if you used SQLite. But honestly, it does so much in there. You can do JSON processing, you can do full text search, you do all kinds of stuff that's just built in. So I would say it does 90% of what you typically need Postgres for, and then that extra 10%, a lot of times you can just get from your local application language.\n\nEric Anderson:\nAnd you're pointing out that in a world of microservices for example, you might have a bunch of small servers who do little jobs and they need a little persistent storage, and so you slap some SQLite on them and you're good to go.\n\nBen Johnson:\nPretty much. As long as people feel like they can use it and if it works for the use case, I think that's great.\n\nEric Anderson:\nIs there a place where SQLiter's gather? When you say it would be great if somebody did this SSH thing, who talks about that and where do they talk about it?\n\nBen Johnson:\nI would say Twitter used to be a place. It's fluctuated recently I feel like with the change over to X and all that. I think whenever there's a SQLite post on Hacker News, people gather there and people love SQLite, honestly. And then there is a SQLite Reddit, it's not super active. There's a SQLite forum that the SQLite folks run themselves, but I wouldn't say it's a lot of service side SQLite people. So it's a mix. I think Twitter was the main place for a long time.\n\nEric Anderson:\nAnd Ben, what's the future hold for you? More SQLite or what's scratching? What's the current curiosity that could lead to the next decade of interest for you?\n\nBen Johnson:\nOh man. I don't don't know about the next decade, but I think just making application development simpler I think is kind what I like to focus on. And there's a lot of work that goes into just small changes a lot of times. So I think it's just getting through that and making it...\nAt the end of the day, I would love it if LiteFS was really just a little checkbox you click and suddenly your data is replicated and there's nothing else you have to know about it. And making those defaults of configuration or whatever, things just more natural and easy to use goes a long way. So yeah, I think just making it an easier tool to use. Really.\n\nEric Anderson:\nSuper. Well, Ben, I'm excited that you got curious about this years ago and then all you've given to the community and appreciate your time today.\n\nBen Johnson:\nCool. Yeah, thanks for having me on, Eric. Appreciate it.\n\nEric Anderson:\nYou can subscribe to the podcast and check out our community Slack and Newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.","content_html":"

Ben Johnson (@benbjohnson) is the creator of Litestream and LiteFS, two open-source disaster recovery solution for SQLite. Litestream is designed to provide continuous backups for SQLite databases by streaming incremental changes, allowing for easy data recovery in the event of a server crash. LiteFS, on the other hand, is built on LiteStream but uses transactional control to focus on replication and high availability. Join us as Ben discusses the challenges and trade-offs of open source contributions and the future of databases.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n\n\n

Ben Johnson:
\nI got in the habit of just releasing stuff. I talk about it, put it out there. I like the open source side where it is just a community of people trying weird stuff, seeing what sticks.

\n\n

Eric Anderson:
\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Welcome Ben Johnson to the show. We are excited to continue our SQLite frenzy with, I don't know, the King of SQLite, but certainly one of the royal family here. We did just record rqlite and then we did Turso a month ago or more.

\n\n

Ben Johnson:
\nIt'll just be a SQLite podcast going forward.

\n\n

Eric Anderson:
\nI don't know whether to call this a Litestream episode or a LiteFS episode, but it is a SQLite episode. Usually I have you start by telling us what the projects are, an elevator pitch, so to speak, to ground the conversation. You could start with one or the other or however you want to do that.

\n\n

Ben Johnson:
\nSure. Litestream probably makes the most sense. That's the first one I started on. So Litestream, the idea of it is it's meant to be disaster recovery essentially for SQLite. So usually when you run SQLite, it just runs in a single box and you could do backups every hour, but really what you want is you don't want to lose almost any data.
\nSo what Litestream does is it takes advantage of the fact that S3 is super cheap to upload to, but expensive to download from, and it just has a streaming replication every second to S3 of incremental changes. Then if your server crashes or just blows up, then you can actually just download up to a second or two from your disaster essentially and recover all your data. That's the elevator pitch for Litestream.

\n\n

Eric Anderson:
\nThere's a million ways we could take that intro, but just to noodle on one thing, SQLite, I think for a lot of people they imagine it in the browser or on a local device. Is that the typical place where people are backing up their SQLite from, or is this meant for a server side SQLite?

\n\n

Ben Johnson:
\nYeah, I really want to make server side SQLite work. So my background, the last maybe decade or so was just in databases. I run another one, a database called BoltDB. It's a pretty popular Go keyvalue store. I really tried to make an application stack essentially on Bolt to see if I could make a key value store work for a main storage. It worked in a lot of ways, but you really start to miss schemas and indexes and those nice little things you get from SQL databases.
\nSo the next step is looking at what is out there for a SQL based database, and really SQLite is the main one out there, just runs on everything and is rock solid. So I transitioned over to that. Actually for the Bolt stack, I tried to make it Bolt, Go and I think it was React I think had just come out at the time, or it was recent. So I was going to make the Burger Stack, which I thought was a little play on words, BGR. Anyway, it never took off, but the marketing was on point.

\n\n

Eric Anderson:
\nYeah, I don't know what makes a stack acronym take off because there's been a lot of good ones of late that haven't quite found escape velocity. So that makes sense for why you want something that's not a key value store, so you want SQLite. But maybe now do the opposite comparison. Why do service side SQLite and not Postgres or MySQL?

\n\n

Ben Johnson:
\nYeah, by all means. I use Postgres and MySQL, and I used to be an Oracle DBA way back in the day, and I have nothing against the client server model. I think it works in a lot of instances and there's pros to it and cons, but once you start developing against a local SQLite database, it is just wicked fast to do. You don't have the concept of N+1 query so much. You might have one SQL query that'll pull a list of orders and then for each order you need to do a separate query for each of your items, and you start to explode the number of queries and just the latency between the server and the client just blows up.
\nYou have to keep doing all these round trips. And when you have SQLite, everything is just right next to it in process and it's just orders of magnitude faster for those for optimizing that kind of thing. And I feel like we're getting to that place where we have these servers that are just super beefy and cheap and have fast storage and the speed of light is like your limiting factor when you're going between your application server and your database. So that's the ideal fit.

\n\n

Eric Anderson:
\nSuper. And I'd like to get into your background and the background of the project just because we had talked about these prior episodes, maybe it's worth flashing that apparently you've worked with Philip not only in SQLite open source land, but you're in the same company together and you've already alluded the fact that you've got a database background. So take us through the small world that we live in.

\n\n

Ben Johnson:
\nSure, no problem.

\n\n

Eric Anderson:
\nAnd why are you working on this?

\n\n

Ben Johnson:
\nSure. Yeah, it's weird. I actually started as an Oracle DBA maybe 20 years ago, and then I've done Pearl, I've done JavaScript applications. I've done the gamut of everything. And about 10 years ago I was working at this company that did behavioral analysis, so basically had to log data and all kinds of interesting stuff from big companies which took and tried to ingest all their logs. And they had this really terribly slow way of doing it, where they'd ingest it into a SQL server database and then they'd run queries and it took a week to process data and what was that map reduced thing that came out?
\nHadoop had recently come out and I was like, "Hey, you guys should try Hadoop." And I was the data visualization guy at the time, I didn't even do databases. I was like, "I think this would be way faster." And they're like, "No, we're not going to do that." So after I left that company just had this little tickle in my head where just out of spite, I wanted to see if I could make it faster, what they were doing. So then you just go down this rabbit hole and when you start making things faster, there's no end to where you go.
\nSo I started with, I was doing some Redis stuff and then I was like, "Ah, it's not fast enough." So I'd go a little further down and I'd start writing... You just slowly start writing your own database, you don't even need to. So I actually ended up writing this proof of concept of a behavioral analytics database, even had its own query language and parser and all kinds of stuff, but would actually write to disc and then query stuff super fast.

\n\n

Eric Anderson:
\nIt's interesting, the life decisions we make. That this one little curiosity set the path for the career in a sense.

\n\n

Ben Johnson:
\nIt's like spike driven development. So yeah, I started going down that road and it was really just a side project at first. And the actual funny thing about that was that I was doing a talk at a local database distributed systems meetup, and I gave this example of, "Hey, if you're, say Shopify," and they were a new company at the time, I was like, "Hey, you could analyze how people flow through and where they drop off." It was funnel analysis early on back then, and you could see where people would drop off at different stages and what they would do instead.
\nAnd it was an interesting way to visualize all this stuff. And two weeks later I get an email from somebody at Shopify. They're like, "Hey, I saw this video of yours." And I ended up going out there and I ended up working with them for about a year and a half until they IPO'ed. And then yeah, I switched over to working with Influx Data for a bit. That's where I met Philip O'Toole who does the SQL distributed SQLite implementation.
\nAnd yeah, we actually don't even discuss the SQLite stuff much until recently. We're just on both our own divergent paths and just met in the middle. But yeah, he's a great guy. I like all the stuff he's doing. Definitely a different approach, but I like that. Everybody has their own take on how to do this really specific weird problem of distributed SQLite. I think it's healthy for the ecosystem.

\n\n

Eric Anderson:
\nThere's got to be a part of you that gets a little bit of buzz out of like, "Oh, other people think this is interesting and want to solve it." But...

\n\n

Ben Johnson:
\nFor sure. I think the worst thing is being the only project in the space. Does anyone care about this problem besides me? So it's great to see competition. And honestly, my end goal would be to make software development easier for people in whatever way that looks. And if someone else comes by and they have some solution that's 10 times better than Litestream or LiteFS, more power to them, I'll switch over and use it, that'd be great. Somebody else solve this problem, where we don't have to spin up a database server and all these other things, we can just deploy out code and not worry about it.

\n\n

Eric Anderson:
\nSo helpful background. You're clearly curious about something, it led to a distributed data processing career and particularly maybe something about SQLite databases, but I don't know that we've gotten to Litestream quite yet. Where are you the day of the first commit thing?

\n\n

Ben Johnson:
\nOh, sure. So Litestream, there was more than I knew. SQLite seemed like the way to go. If you wanted a fast embedded database, I hate all the complexity going on with all the stacks I get to set up of Postgres and then maybe need some caching using Redis or whatever in your stack, MIM cache. And just all the different components and having to set them up, and one breaks and just debugging all that's such a nightmare that for the majority of small to medium-sized projects, I feel like they can get by on SQLite running on a half decent sized machine somewhere out there. So I wanted to figure out that problem.
\nIt seemed really interesting to me and I realized that SQLite was the thing I wanted figure out. That seemed to be the main tool, but actually figuring out how to hook into that was the big problem. They don't really give you a great API and I wanted to make it something where you don't have to custom compile anything. You just plop this thing in and it just works. There's minimal configuration. I didn't want the application to actually know about the underlying ops side of it where it could be running against a regular SQLite database or it could be running against this thing that's also uploading to S3.
\nYou should know that from the application side. So it was really more just a bunch of iterations. And even for a while I was looking at rewriting SQLite and Go just more to understand how SQLite works. I tend to rewrite projects to understand them. And so I was doing that for a bit and just one day reading through some of the docs. Litestream, it works on a fluke almost. So the way that SQLite works is that it does this thing called write ahead log or there's a mode that it has where it'll essentially write all your changes to a separate file called the write ahead log, and then it just depends onto that file over and over again.
\nEach new page it writes, it depends onto there, and then eventually it gets too big and it needs to do a thing called checkpointing, which is where it takes all those changes and then copies them back over to the main database and it restarts that log. But the thing is with SQLite, it can't actually do that checkpointing process until it doesn't have any transactions going on. So Litestream essentially hooks on and does a read only transaction, so it's like a long living read only transaction.
\nSo we can look at the data behind the scenes as it's getting written in, and there's some checks and things to prevent it from missing data in there. So it doesn't actually use an approved SQLite API, but it goes through SQLite channels and all that stuff, the way you're supposed to use SQLite.

\n\n

Eric Anderson:
\nYou found a programming interface, just not the one they intended you to use.

\n\n

Ben Johnson:
\nIt's not official necessarily, but just how their logs work and all that. It's just stumbled upon it one day and I tried it out and it worked. It was great.

\n\n

Eric Anderson:
\nSo SQLite's famous for being a single file. My database is just a file. And so this write ahead log is a second file that's maybe hidden somewhere that's temporary.

\n\n

Ben Johnson:
\nYeah, essentially. And it's a little bit of a misnomer. There's actually four different files you can have SQLite database, but yeah, essentially you can think of it as a single file.

\n\n

Eric Anderson:
\nSo you create Litestream. Did you have an ambition? You were excited about making this thing real.

\n\n

Ben Johnson:
\nI didn't really think anybody was going to take it seriously.

\n\n

Eric Anderson:
\nSo how do you go about launching an open source project? Was this your first and what does launching SQLite look like to you?

\n\n

Ben Johnson:
\nNo, it's not my first. I had some decent success with the BoltDB.

\n\n

Eric Anderson:
\nThat's right, I'm sorry.

\n\n

Ben Johnson:
\nYeah, that one got pretty popular and then I eventually archived that one and SED who was Core OS at the time before they got acquired by Linux, or no by Red Hat, I think they eventually took it over. But yeah, I think I've gone up and down through a couple different projects. I feel like I've started a ton of projects and a couple of them worked. So you can find it's just a graveyard of repos on my GitHub and a couple that actually worked out. So I think it's, I got in the habit of just releasing stuff. I talk about it, put it out there, and I've always just had an interest in trying things out.
\nI like the open source side where it's just a community of people trying weird stuff, seeing what sticks. There's a thing called symbolic execution in computer science where it's used for testing or you can use it for a lot of things. You can actually write out a program and then you can actually generate test cases for it based on... You basically make these math equations for each branch as it goes through your program and you feed it into what's called an SMT solver. And then they can spit out inputs that would solve all the different branches and they were super weird nerdy projects.
\nAnd I spent probably six months doing a port of one of those, it's a C program called [inaudible 00:13:19]. I ported over to Go and used Go's SSA format. I spent tons of time on it and then it just went nowhere. No one cared. Probably the most advanced project I've worked on, and it just fell with a thud. So you really never know honestly. I think the more accessible the project is, I think that helps a lot. People love SQLite, so in hindsight, I think that was a natural thing. People want to see things that they already love get additional support.

\n\n

Eric Anderson:
\nAnd was Litestream ever to be a startup? Eventually you made your way into Fly, I think in relation maybe to your Litestream work.

\n\n

Ben Johnson:
\nSure. Actually Fly actually purchased Litestream as a project, which is unique. And then I came on as well with that obviously. And it wasn't ever meant to be a startup or anything. I think I had some thoughts of I wanted to create a service where people could make it easier for people just to be able to continuously stream back up or use SQLite's in some way. But I didn't envision it as the next billion-dollar startup thing. I've tried doing startups and things in the past and they fail miserably. They never get anywhere off the ground. So I've realized that's not my forte.
\nAnd so yeah, there's no big ambitions, but I think with Fly, they'll want to make things where people can run their application easily across multiple regions and whatnot, and just make things really fast for end users. And that was what I liked about SQLite and where I was going after Litestream, I was trying to make some replication on Litestream itself, and that worked, but it eventually got forked off into a separate project. We needed to rework a lot of things to make it work, and that's what ended up being LiteFS essentially.

\n\n

Eric Anderson:
\nSo Fly could have forked Litestream presumably, but then that would just confuse the world, like why and what. So by buying the project, they get your endorsement and you get to keep working on it. And there's one community and everybody's happy

\n\n

Ben Johnson:
\nYeah, and they do a great job. Honestly, one of the reasons I came on is because they do support a lot of open source creators and they're not even very flashy about it honestly, but there's a lot of projects out there that they're the highest donors too. So I like that side of it. But yeah, I don't think they had an interest in taking it over or trying to be some big name around it. I think they just saw it as helping to enable their users to run stuff better. So I think that was more the ethos. And honestly, unless you have experience and doing database C stuff, it's not much fun to be on this really low level piece.
\nWhat we wanted to do with the LiteFS was, so Litestream hooked in through the regular SQLite transactional hooks-ish sort of. And then LiteFS takes a different approach where we really wanted to make it feel like you're writing just a local Litestream, but you wanted to be able to have those rights automatically replicated instantly to your other nodes and have that all work seamlessly. And you have a lot of things around ensuring consistency. You don't want a separate process to die and then come back and then it gets inconsistent from where the data's at.
\nSo we actually built it as a user land file system, which is a weird approach, but it can essentially intercept rights and basically check them, see where a transaction begins and ends, and it can package those rights into a file that we then shipped to the different replicas in real time. So it's much more strict in terms of consistency of the actual file contents, whatnot. Litestream did what it could with the API it had, but we really just wanted a little more control around it. So that's where LiteFS ended up going.

\n\n

Eric Anderson:
\nShould LiteFS be then thought of as a next generation?

\n\n

Ben Johnson:
\nI think it's really more like a separate thing. Yeah, Litestream is really great if you just have a single node and you want disaster recovery. That's really the aim of that. And then the two most requested things and Litestream were people wanted to do replicas. So essentially they had a primary and then writes immediately go out to other nodes and then they could query off of those as well.
\nAnd then the other one was having failover. So if the primary dies it fails over automatically to another node, and that was the goal with LiteFS. So it has a few more moving parts than Litestream, Litestream is pretty dead simple. And then LiteFS, it has some hooks around being able to switch what the primary is and then hook into the other replicas. So it's really more of a replication and high availability tool than Litestream is.

\n\n

Eric Anderson:
\nAnd I shouldn't go here, too much of a noob, but I'm going to. The way you described this other API, that LiteFS is using, felt reminiscent of people talking about using the Postgres wire protocol. This isn't a wire protocol that you're?

\n\n

Ben Johnson:
\nNo, it's not a wire protocol. Although I just make a project that actually you can interface with SQLite databases over the Postgres wire protocol.

\n\n

Eric Anderson:
\nOh gosh.

\n\n

Ben Johnson:
\nIt sort of works. That's pretty hacky. Don't use that.

\n\n

Eric Anderson:
\nWhat's the state of LiteFS today? It's a database you can go run on Fly?

\n\n

Ben Johnson:
\nWe run it internally in production and essentially you set it up and it's great if you need... A lot of times your biggest overhead in terms of latency is just geographic. So if you have people that are around the world, especially a latency from US, I don't want to say here to Europe, but the US to Europe is usually about a hundred milliseconds is the ballpark. And from the US to Asia around a quarter of a second, 250 milliseconds.
\nSo if you can actually place data over there that's right next to your application, you can just get so much faster response times from your web apps. So that was the impetus of what we wanted to do and we're trying to make it as dead simple as possible. So there's not a lot of thinking about, you don't need to be a distributed systems expert to run this thing. That tends to be a problem in the distributed systems world. It's like everyone needs a PhD to run anything.

\n\n

Eric Anderson:
\nSo I mentioned to you, we had Turso on a couple months back, they were describing libSQL mostly from the open source angle. And I should ask you about SQL and the SQLite world later, but they also do some edge SQLite things. Could you help us understand the differences or similarities between the two approaches?

\n\n

Ben Johnson:
\nSure, no problem. This is no knock towards Turso. I think there's trade-offs for both. And honestly, sometimes I feel like people in the SQLite community should make some drama. I don't know, but we're all friends. Everyone like, "Oh, that's cool," and people borrow from each other and it's fun. But yeah, Turso, the way they've done it is they're still essentially a client server tool where you're connecting up to another server. It's usually at an edge location. It's not in process, like LiteFS would be. They do have an option where you can actually embed replicas, like read only replicas locally with your application.
\nThat's a newer feature I believe. But yeah, they've taken a different approach in how they hook into SQLite. They actually just forked off the project and they made a couple changes around the write ahead log and how they can hook into it in some different application hooks. So really it's not a pure SQLite approach. And that's fine. It depends on what's important. I like to reduce the amount of friction people go through. If you already have SQLite running, you don't have to install different libraries or do any tweaks around that. You just plug in this thing and it should hopefully work.
\nBut again, it's a different approach. I think there's definitely benefits. It's a managed service. We can't really do a managed service with LiteFS because they're really running in the same process more or less as the application itself. So a managed service doesn't make a huge amount of sense, whereas with them, they can connect up, they do the management of the actual servers themselves. So there's some ease of use and ease of maintenance that they get out of that.

\n\n

Eric Anderson:
\nYeah, I think you're close to clarifying. You answered questions that I didn't realize I had with that. So LiteFS is not a managed service because you end up just being SQLite files on local disk.

\n\n

Ben Johnson:
\nIt's essentially like a file system that lives on the same node as your application or on the same server.

\n\n

Eric Anderson:
\nSo I have an app server and instead of having my database over the network elsewhere, there's just files on my app server.

\n\n

Ben Johnson:
\nFiles on your app server. And then it has a little staging area where you can hold almost like diffs of your... Every time you do a transaction, you're getting this transactional file and that can get shipped out to the other replicas, but sometimes other replicas can lag behind. They can get disconnected. So when they come back up, you want to have some most recent set of those transactions. So there's that staging area for those as well, if that makes sense. Maybe that's getting too far in the weeds for database replication.

\n\n

Eric Anderson:
\nYeah, that's fine. In the edge world, there's some of these serverless app servers where they just get spun up. I'm speaking the Fly language a little bit, but poorly. Quickly process your requests and then get spun down. I believe that's how it works to some degree. So in there, my database then is just active for a moment. The files are there on that same little execution node?

\n\n

Ben Johnson:
\nYeah, files are there. Right now LiteFS works where it needs to have at least the primary nodes, those running all the time. They don't do well for auto stopping, which you can have for replicas elsewhere where they can come up and they can pull in the latest data and stay up for a little bit and then shut down when they're not used.
\nAnd we started doing some work on what's called a virtual file system, which is a concept inside SQLite where they actually abstract out the file system because they run on Windows and they run on Unix, and there's a layer in there. We've actually built a version that works with our... We have a managed service for just disaster recovery stuff with LiteFS called LiteFS cloud. It'll actually work with that and it'll pull down pages and it's transactionally aware within there. So you can run it on something like [inaudible 00:22:49], but it's still alpha beta.

\n\n

Eric Anderson:
\nSo I paused this on the SQLite open source libSQL, but maybe just SQLite is apparently a unique project in terms of outside contributions.

\n\n

Ben Johnson:
\nThey don't allow it.

\n\n

Eric Anderson:
\nWhat are your thoughts on that as someone who probably considered... There was probably a point where you're like, "If I could just make a contribution, this would be a lot easier."

\n\n

Ben Johnson:
\nI can see where they're coming from. I disallowed contributions. I've done that on a lot of projects personally. I think my contribution policy got on Hacker News one time. And I was expecting people just to dump on it, but everyone was like, "Oh, I can understand that."

\n\n

Eric Anderson:
\nMakes sense.

\n\n

Ben Johnson:
\nIt's just draining after a while sometimes. Especially with the projects I do, I really like having an idea of what the end goal is. With Bolt, I get a lot of shit for, I essentially called it done at one point where I wasn't going to make any more feature changes. It was this is it and we'll do bug fixes and whatnot. But I really wanted a simple project where you don't have to worry about constantly learning what's being added or different things.
\nAnd those features coming in might bring in bugs for other things. I wanted a staple and I had an idea of what the end goal was. So when you have outside contributions, people always want to add more stuff. No one's coming in trying to add more test cases or something. It's always some really big feature that like, "Oh, I think this web server belongs in your key value store," or something like... It's maybe not that crazy, but trying to constrain the vision while accepting outside contributions is tough.

\n\n

Eric Anderson:
\nYou had this contribution policy that was basically like, "Don't try," and it was on Hacker news.

\n\n

Ben Johnson:
\nYeah, it was like F off. And I think people understand for the most part. Some people get a little grumpy about it, but by and large, I think people understand and appreciate that.

\n\n

Eric Anderson:
\nSo libSQL, is that... You answered this earlier, that since you've been able to make Litestream work with SQLite, then it's nice to be able to not have to switch libraries.

\n\n

Ben Johnson:
\nThere are certainly things I'd like to probably tweak in SQLite and I put that up there to see if they'd be open to changing those things. But I think by and large, I can think it works well. I like the actual, been dictator for life model of open source. It tends to be like one person or two people are the main contributors to most open source projects and just they have a sense of where the thing is going or where it should go and what are the constraints, other projects in their head.
\nAnd trying to open that up to just everyone. I think there's pros and cons for sure. I just think it's a lot to put on people, especially for projects where you're not necessarily making money off of it or it's not your full-time job. It's just like this fun thing you like to work on, and then you just keep having people come in and trying to change it around. So it's more power to them if they're opening it up by all means. It's more of a mental health thing I think, for me.

\n\n

Eric Anderson:
\nSo what's the future for databases? I think a lot of people are building apps that are more or less the same and they reach for the database they're familiar with. Might they benefit from just starting with LiteFS? Is that the right use case for IFS? Just most new crud projects.

\n\n

Ben Johnson:
\nHonestly, I would even step back a little bit. I think you can just start with SQLite honestly, you don't have to use my stuff at first. I think if you start growing and you realize, "Oh, I want to have continuous backups..." On our Litestream page, we actually have alternatives like, "Hey, here's how you can just set up hourly chron backups. You don't even have to use Litestream."

\n\n

Eric Anderson:
\nBefore you go on further. So I should just have SQLite server side sitting on my app server as we talked about. And then once I'm like, "Oh, I feel a little uncomfortable, I should probably have backups." Then I can Litestream it to get backups and then... Sorry, pick up where you were.

\n\n

Ben Johnson:
\nOh yeah, Litestream it as you say. I haven't used it as a verb, but that's good. But you can continuously back up your data up to S3 and it has some nice protections around it. And then if you continue to grow or maybe you have users in a different side of the world and you're like, "Hey, this is slow over here, it would be great if I could just have a replica and replicate my data."
\nYou can use LiteFS. I think at each stage I really wanted to make it, you don't have to jump in and commit to this new project from day one. It's really more like an evolution of like, you have SQLite, "Oh, here, you can tack this thing on when you're ready for it," and then you can tack this thing on and when you're ready for it. And just slowly move towards that end goal.

\n\n

Eric Anderson:
\nAnother enthusiasm for SQLite is around Local First, and I want a SQLite instance on my device, and then maybe it can talk to a SQLite in the cloud. And now I've got the benefits of Local First and a service side database. Is that a Ben Johnson use case?

\n\n

Ben Johnson:
\nNo, it's not. So there's two separate worlds. So LiteFS and Litestream both do what are called physical replication. It basically copies the exact pages, like the bytes in the pages across the network to somewhere else, and they can recreate your original databases through change sets. And if you have the Local First stuff. So CR SQLite is one that's pretty popular where it's actually CRDTs over SQLite, CRDT is conflict... Or no, I always-

\n\n

Eric Anderson:
\nResolution.

\n\n

Ben Johnson:
\nYeah, data type something. Anyway-

\n\n

Eric Anderson:
\nI think so.

\n\n

Ben Johnson:
\nIt's the worst marketing name in history, but it's a way to have people changing the data in two different locations and have it sync and you can figure out how they merge together. So that's really a separate world of SQLite. I think SQLite is probably the most popular as far as I know. I know there's another one called [inaudible 00:28:22].
\nI don't know if they're still doing it. But yeah, so there's been different approaches to that. And I think Local First makes sense for certain projects, but I think there's a lot of mental... You really need to understand how conflicts work and how all that stuff works and whether it's actually beneficial for your use case. I think there's a lot of overhead to it.

\n\n

Eric Anderson:
\nYeah, it feels like for most use cases, most applications, it's a nice to have feature and the implementation is quite a lift.

\n\n

Ben Johnson:
\nYes. I would agree with that.

\n\n

Eric Anderson:
\nAnd most people are like, "You know what? Maybe I don't need to have that actually." But for some apps that's a critical thing and they make it work. I don't know.

\n\n

Ben Johnson:
\nAnd again, more power to them. Eventually consistent stuff generally, most of the time it's just hard to do. Early's hard to do well. You get a lot of edge cases and weird bugs. That's my warning to people. Anything eventually consistent.

\n\n

Eric Anderson:
\nI think we've covered most of what I wanted to cover. Ben, anything that you feel like could be interesting that we haven't covered?

\n\n

Ben Johnson:
\nI think there's a lot still around SQLite usability on the service side, which I think can be improved. I love to see people work on that. I started a little hack of... I saw somebody else make a hack of this, and I made my own hack of this, of connecting to SQLite over SSH, so there's no server running on the server node.
\nBut when you connect over SSH, you can run a program over there on the other side and then communicate over standard and standard out with it. So it's essentially connecting out to your SQLite program on your server and then doing queries against that. But again, there's no official API to doing that within SQLite, like the SQLite's CLI. So I'd love to see more around that.

\n\n

Eric Anderson:
\nYou're saying there's an admin interface, like I can communicate with my database through the app server, but if I want to configure it or do some other admin type things.

\n\n

Ben Johnson:
\nOr if you just want to do some backend queries, I think that'd be great to see. But yeah, I think there's mainly usability honestly is, I think the biggest hurdle. I use a lot of CLI tools, so I don't mind SSH being into another node, but some people dislike that, so I think that can be a hurdle for sure. So if there's more gooey, more usable interfaces for more people, I think that would help a lot.

\n\n

Eric Anderson:
\nWhat is the community like around server site SQLite? Do you find that there's a bunch of people who are kind of, "I've moved on from my SQL and Postgres. I just do, just as we described, I start with my server light SQLite and then I start doing backups, and then I do this conflict resolution," or whatever it is, and they've moved on. They don't need Postgres or MySQL anymore?

\n\n

Ben Johnson:
\nI don't think they're mutually exclusive by any means. And honestly, when you look at companies or businesses, you have your application like the main one, and then you have 20 or 30 of these side ancillary applications of things. This thing runs and does whatever, has this little UI on the side. It's not part of your main application that has to be up and running all the time. And those can tolerate, if you have to restart the server, you don't mind people not connecting to it for a second or two. So I think there's a lot of different use cases out there.
\nAnd I think SQLite can be great in a lot of different ones, but I think as far as starting out, I think a lot of people feel more comfortable using SQLite. And honestly, for a long time it was just like the toy database that people made fun of you for if you used SQLite. But honestly, it does so much in there. You can do JSON processing, you can do full text search, you do all kinds of stuff that's just built in. So I would say it does 90% of what you typically need Postgres for, and then that extra 10%, a lot of times you can just get from your local application language.

\n\n

Eric Anderson:
\nAnd you're pointing out that in a world of microservices for example, you might have a bunch of small servers who do little jobs and they need a little persistent storage, and so you slap some SQLite on them and you're good to go.

\n\n

Ben Johnson:
\nPretty much. As long as people feel like they can use it and if it works for the use case, I think that's great.

\n\n

Eric Anderson:
\nIs there a place where SQLiter's gather? When you say it would be great if somebody did this SSH thing, who talks about that and where do they talk about it?

\n\n

Ben Johnson:
\nI would say Twitter used to be a place. It's fluctuated recently I feel like with the change over to X and all that. I think whenever there's a SQLite post on Hacker News, people gather there and people love SQLite, honestly. And then there is a SQLite Reddit, it's not super active. There's a SQLite forum that the SQLite folks run themselves, but I wouldn't say it's a lot of service side SQLite people. So it's a mix. I think Twitter was the main place for a long time.

\n\n

Eric Anderson:
\nAnd Ben, what's the future hold for you? More SQLite or what's scratching? What's the current curiosity that could lead to the next decade of interest for you?

\n\n

Ben Johnson:
\nOh man. I don't don't know about the next decade, but I think just making application development simpler I think is kind what I like to focus on. And there's a lot of work that goes into just small changes a lot of times. So I think it's just getting through that and making it...
\nAt the end of the day, I would love it if LiteFS was really just a little checkbox you click and suddenly your data is replicated and there's nothing else you have to know about it. And making those defaults of configuration or whatever, things just more natural and easy to use goes a long way. So yeah, I think just making it an easier tool to use. Really.

\n\n

Eric Anderson:
\nSuper. Well, Ben, I'm excited that you got curious about this years ago and then all you've given to the community and appreciate your time today.

\n\n

Ben Johnson:
\nCool. Yeah, thanks for having me on, Eric. Appreciate it.

\n\n

Eric Anderson:
\nYou can subscribe to the podcast and check out our community Slack and Newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.

","summary":"","date_published":"2024-01-17T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/f1ab48ba-9845-4a76-8936-c8b64a5640cc.mp3","mime_type":"audio/mpeg","size_in_bytes":32941497,"duration_in_seconds":2054}]},{"id":"e9d6d332-b294-4920-bde0-ef9107a275d7","title":"Rust Never Sleeps: Tonic with Lucio Franco","url":"https://www.contributor.fyi/tonic","content_text":"Tonic is a native gRPC implementation in Rust that allows users to easily build gRPC servers and clients without extensive async experience. Tonic is part of the Tokio stack, which is a library that provides an asynchronous runtime for Rust and more tools to write async applications. Today, Lucio Franco (@lucio_d_franco) of Turso joins the podcast to discuss his unique experience maintaining Tonic and contributing to the asynchronous Rust ecosystem.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n The challenges of async Rust and ways the community has addressed them\n\n Lucio’s plan on how to get a job in distributed databases\n\n How the Tokio team avoided power dynamics\n\n Problems around working on open-source in the corporate world\n\n Why Lucio encouraged a collaborator to go on without him \n\n\n\nLinks:\n\n\n Tonic\n\n Tokio\n\n Turso\n\n Tower\n\n\n\nPeople:\n\n\n Carl Lerche (@carllerche)\n\n\n\nOther episodes:\n\n\n The Big Fork: libSQL with Glauber Costa\n\n\n\nTRANSCRIPT\n\nLucio Franco:\nThe VP's making the choices about where to put money, they don't think about it. They take it for granted. And this is a huge problem in the corporate world with open source. It's like a fog of war, they don't see far enough down the software supply chain of what's powering everything.\n\nEric Anderson:\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Today we're on air with Lucio Franco, who is part of the team over at Turso. We talked to Glauber maybe a few episodes back. And we're going to talk about Rust and some emerging standards, and in particular a project called Tonic, part of the Tokio broader family. Lucio, tell us what Tonic is real quick. We'll see if we can encapsulate that in a elevator pitch and then we'll talk about the whole story.\n\nLucio Franco:\nTonic is a gRPC library at the base. It provides your client server, your basic code gen. The library is pretty simple, it just allows you to talk gRPC with any other language, because gRPC is a multi-language protocol, so it's pretty efficient. Tonic itself, some of its higher level goals have always been around being super ergonomic and efficient. So if you don't have any history around async Rust in the past, it's quite challenging to work with. It's improved a lot over the last five years, but when it first started it was quite hard to work with. So Tonic is essentially like I want to write a networking server, a client server, or talk to a server or client, but I want to do it really easily. I want to get started and not have to worry about the hard thing. So that's essentially Tonic's goal of a project is to enable those types of users that are somewhat new to the asynchronous world.\n\nEric Anderson:\nAnd I don't know the right order to do this, Lucio, your background or Rusts background, or maybe you know a way to intertwine them, but in researching our conversation, I didn't realize there was a lot of history around async with Rust and there's not a standard way of doing this, and so the community has solved this in a few ways. Tokio, which Tonic is a part of, is an umbrella brand for a set of tools that all are going to work, interop well together.\n\nLucio Franco:\nWe can back up a little bit. Rust 1.0 in August 2015, July, something like that, so it's a very young language. And in effort to standardize or 1.0 the language to make it available to people, there was a choice to reduce the size of the standard library. The less surface area that you have to ship, the easier it is to ship, so that's the idea. Let's make package managers really easy compared to something like C++ or getting dependencies inside is very difficult. Rust is like, \"Let's make that very easy and then let users innovate on their own and not have to deal with this whole RFC process to get stuff into the standard library that is, one, extremely slow and brain drain. It's exhausting.\" So Rust at the core has always started with a small standard library.\n\nFor example, a language like Go has a lot of stuff shipped with it, it has an HTTP implementation, it has green threading implementations that you can just take the standard library and just run with and actually built some pretty good stuff out of the box. Rust did not take that approach. Rust let you push that onto users. Tokio itself, there's two things here going on in the early stages of async Rust or the story of asynchronous networking in Rust. There's the design of futures, which would be the idea of how do we execute anything asynchronously, which was being driven by some compiler folks or library folks in Rust. And then on the side there was, okay, how do we leverage these interfaces and build something off of it that is actually async? And this is where Tokio came into play. Around 2016 ish, I think, was when they started developing Tokio a little bit. And it started off as a third party library.\n\nIf it works, why does it need to be a part of the standard library? Because putting stuff into the standard library is a lot of work. There's a lot of bureaucracy involved with it, which I completely agree we should have, because stuff that goes into the standard library is stamped and shipped to everyone in the world. So being able to do stuff outside of that enabled Tokio to innovate and push the boundaries of what was possible. And I think we're seeing a lot of the fruition of those effects in the modern day. Tokio is extremely popular now, it's being used at essentially almost every major tech company for something, and it's growing popular in the ecosystem as being a leader in this space.\n\nEric Anderson:\nMaybe I interrupt you here because I think you've given us some early history into Rust and Tokio. And now what's your story? Because at some point you intercepted with Tokio.\n\nLucio Franco:\nSo actually my origin story with Rust actually started in college. Before I got into networking and all of this distributed systems stuff, I was like, \"I want to write game engines,\" and I failed so many times trying to get OpenGL linked into my C++ project, banging my head against the desk reading Stack Overflow screenshots of Xcode settings and having no clue what's going on. And then I stumbled across Go and I didn't like the Go path thing, and then I stumbled across Rust and I was like, \"Wow, I can just cargo add OpenGL and things just work.\" And I was like, \"This is amazing. This is clearly the best.\" So I fell in love with it then and there, Rust as a core, everything just made sense. I read the book and I was like, \"Why aren't other people doing this? This is so awesome.\"\n\nAnd then obviously did my game stuff for a while, and as I was graduating I was like, \"I need to do real stuff,\" because I didn't want to work in the game industry. And so I was like, \"I love distributed systems and distributed databases, why not get involved?\" Ironically now I work at a database company doing distributed database stuff, so if you think about my goals back then and where I've landed now, the path has worked out quite well for me. So I started learning Rust and then obviously I saw the future is in async networking stuff coming up, and I tried to play around with it for a while and I absolutely failed. It was not the easiest thing. The error messages coming from the compiler were quite confusing just using it in general, not having a clue what async networking was, I just knew it was a cool thing and it's high performance.\n\nSo if I was wanting to write a database, this is the stuff I want to use. So a couple of years went by, I worked in a different language and I started... Honestly, I don't know if you guys remember Gitter, it's a chat platform for GitHub, so I get a project to create a chat platform. And so me being curious, I just was poking around the Tokio stuff and I ended up going on Gitter and talking with Carl who's the author of Tokio. And we were talking back and forth a little bit. And that's basically how I got involved in Tokio was just honestly letting Carl talk to me about stuff. And I had no idea what was going on, but I was curious and wanted to learn. So I got myself involved and I think, instead of working, I was doing some open source stuff here and there, and it wasn't part of my full-time job.\n\nThat's how I got involved in the open source thing, but Tonic itself, the origin story is actually interesting because continuing on this path of seeing this projects and I started going to Carl, I was like, \"Hey, where can I help?\" Obviously, at the time I didn't really know what epoll was, I had some idea, but epoll's the Linux library that essentially Tokio wraps around, and that's the core API that powers everything asynchronous. And so I didn't even know the concepts of this, so I didn't really know what I was doing, but I wanted to learn. So I asked, I was like, \"What are some space that I can help out with?\" And it turned out that there was some experimental projects done with Tower, which is a middleware library. And it had been written, but it needed a lot of polish, as with everything. At this stage of Rust, everything was very new.\n\nAnd so I started playing around with that a little bit. And then I saw this project that was called Tower gRPC, and I'd seen that Carl had actually written this. He was at the time working at a company called Buoyant writing a proxy that needed to do some gRPC stuff in Rust, and so he had written code. But as you are with any startup, when you write code, you're not writing code to release it to people. You're writing code into your product and, all right, it works, we're done, let's focus on the next priority, because it's a startup, so this thing was like this half hacked state, it worked, but it was the definition of unergonomic, not Carl's fault obviously, but at the state of where things were at, it was quite difficult.\n\nSo I decided if I wanted to work in databases and networking, that was the end goal, why don't I take this project and try to make it better? And roughly this is at the time when async functions were coming around. So we're going from handwriting state machines that were really complicated and extremely error-prone, extremely, extremely error-prone, to writing really nice generators that did a lot of the heavy lifting. So we were making this big jump, not just the Tokio people, the Tokio people actually this is all compiler stuff. So this is like the people in the compiler working on that singular futures interface, improving how people can write stuff from the compiler perspective. So for us it was a target of, \"Hey, we have this new feature shipping in Rust that's going to really change the game. Really, really change the game. Let's ship something along with this.\"\n\nSo we were planning some releases of Tokio, and I was like, \"You know what? Now is the time, now is my window to write a library that, one, can be really ergonomic and easy to use.\" So thinking back to me in college and I was to play around with some RPC libraries, I fell flat, I had no idea how to do this, how could I improve upon that experience? How could I make this easier for users? Not just discovery, but documentation examples, just how do we reduce the foot guns, how do we make it so you don't dig yourself into a hole that you didn't think was possible and then you couldn't get out of it? And so that was my main focus of time was how can we really change the game here? And so I spent a lot of wakeless nights, no, I slept, but a lot of nights after work coding, trying to get this done and shipped, and working on airplanes, and actually I found that airplanes was my best spot to focus on, because with my ADHD there was nothing else to do but code, so it was perfect.\n\nAnd I got it shipped basically just in time for async/await to ship as well. And so technically Tonic was the first product, and I say product, I'm not really selling anything, but it was the first library on the market or in the Rust ecosystem that was taking advantage of this new async/await technology, was working, and basically production ready, and that was the goal is to lease this all at one time and then iterate upon it, and take lessons learned and stuff. Looking back at it now, in the last four years, the original design actually worked quite well and yet to hear people complain about stuff, which is either a scary or a very positive thing, but there's definitely some hidden things in there that I would love to fix in the future that I always put off to the end, but overall quite happy with the progress the library has made at this point.\n\nEric Anderson:\nMaybe, Lucio, just to dig into a couple of things from that experience, is this your vision or is it Carl's vision?\n\nLucio Franco:\nThe real [inaudible 00:10:32] for Tonic was I wanted to work in distributed databases. So I was like, \"What's the best way to get a job working in distributed databases?\" It is to write the protocol that they're using. You go to an employer and you say, \"Hey, you're literally using my code. You depend, your entire product, on me, hire me.\" So that was my vision, that was my motivation. Obviously, I think the goal that Carl was working on in general and the others in the Tokio group were really good, and I think we pushed for some similar fronts, but I think that there's, what I was pushing for, that ulterior thing for a job, but the goal of making things more ergonomic and easier was really important. And actually the third thing I think that was really critical to Tokio's current success that we really focused on was shipping.\n\nA big problem that happens in the open source community, and we see this a lot with Rust. Most of my exposure has been to Rust, so I can't really speak to a lot of other languages and communities, but Rust, we have this decision paralysis. And the same thing with the standard library versus non-standard library discussions. If you want to put something into the compiler, it gets drilled down on you're going to have to answer a million questions, people coming left and trying to make decisions on, oh, what if we did it this way? We want to support this. What about that? And that, one, becomes very exhausting, but it doesn't allow you to ship. That's why things, for example, like the original async/await shipped four years ago, but we still don't have interfaces that are async/await. I think it's shipping now, but it's quite a time later.\n\nThis is just purely due to the politics that happens in open source. And Tokio took a different approach. Luckily we were all amicable to each other, so we all liked each other, there was no drama, but it was like, \"Hey, why don't we not spend our time always discussing and going in circles, let's get something out there that people can use so people can build stuff with.\" Rust is not going to go anywhere if people are not able to actually build stuff, and people from any level. I'm sure someone who's very experienced could take some of the older stuff that was really difficult to work with and build something, but we have to support people who are mid-level and juniors, and we want to grow into that space like Go has taken really a strong advantage here of being really easy to work with.\n\nSo part of our Tokio's goal was to ship an easy to use ergonomic library but ship it. And that was the same goal with Tonic. That's why I was like, \"Let's get it out in time for async/await, so everyone can, the day async/await lands, boom, what are you going to choose as your gRPC library? Tonic.\" So we were pushing on all fronts. I think we all had agreed that that was the right way to go. We didn't want to run in circles any longer. And so that's I think been a huge advantage of the Tokio group of us really not trying to foster drama or any power plays or anything like that, because open source tends to have a lot of power vacuums and people trying to take over.\n\nEric Anderson:\nMaybe help me understand one aspect of this and that's that... I don't know if runtime is the right word, but this fundamental aspect of the program language that's not included in a standard library, async/await, when you write your gRPC, you write to a single approach to async/await, even though there may be others, and so Tonic can only work on Tokio's async/await, is that right?\n\nLucio Franco:\nIt's a little bit more complicated. So the async/await is just like syntactical sugar in the compiler to convert to some internal types. Think of it as like a transformer. In reality it is exactly a transformer. It transforms into a type that implements future. Future is some interface of something that can run asynchronously, and pause and resume. Those are all in the standard library. Finally, future made it into the standard library after a few years, but yes, that all made it to the standard library, Tokio itself just uses the future interface and a couple global variables to hook into your thing to notify you that you're ready, whether it be a timer or whether it be your epoll thing saying, \"Hey, your file descriptor is not ready to read from because there's bites waiting in the TCP buffer,\" that stuff is tied to Tokio.\n\nFor example, if I use a TCP stream type from Tokio, it needs to run inside the Tokio runtime, and this is because the TCP stream type needs to know how to register itself with the API that Tokio provides. And there's no way around this. It's very hard to abstract that, or abstract it in an ergonomic way, because those APIs are so tightly ingrained with what it's doing in the scheduler that it doesn't really make much sense. So, yes, there's been this kind of issue in the Rust async ecosystem where you can't mix and match runtimes and swap out runtimes, and it makes it very challenging. It ruins your programmer mental model that you'd like to have where things are abstracted and I can just change one line and now I'm using a different runtime. In reality, the argument for that too is the use cases are not very high.\n\nYes, you have an embedded system, maybe you want a different runtime, completely agree, go use an embedded system runtime from the start. Tokio's net was never intended to work in that space. But spending all this time on abstracting things when, for example, we're using epoll in Tokio, but remember at the time that we were talking about all this abstract and runtime stuff, io_uring was coming around. If you don't know what io_uring is, it's essentially a new Linux API that allows you to dispatch assist calls asynchronously right in the kernel. So it's a really efficient way to do it, but it completely changes the model of how you do IO. From having to send the data into the kernel or just passing a pointer and being able to read through it, it makes it much more complicated. So then how can you abstract something that is still being innovated?\n\nIt's the same question about the standard library. If we haven't figured out the right abstractions yet for how to do epoll in Tokio, how the heck are we going to be able to put that into the standard library and convince people that this is going to be the implementation that we want for the next 10, 15, 20 years? It's not possible. And, in fact, for Tokio, we didn't really think about this. We didn't care about the abstraction because we wanted to get something that people could use. We weren't thinking about how could users swap it out in and out. It wasn't a play to be having monopoly whatsoever. Not at all. It was this isn't beneficial to achieving our current goals of getting this in the hands of users so they can build stuff.\n\nThat was primarily our number one goal about users not writing the most perfect system, not about writing the best thing ever, but about getting something that people can use and build real systems with. Because, in reality, it's how you bring a language forward. You can't just stay academic about everything. You need to be a little bit more pushy on getting things in people's hands. It's like doing a startup without the funding and all the stuff like that obviously.\n\nEric Anderson:\nWell, actually that's a good segue. We will get into funding and economics related things in a moment. So this worked out for you, your plan of I'm going to work on the gRPC implementation and then I'll get hired by a database company.\n\nLucio Franco:\nIn fact, actually I had a friend, he was working at AWS at the time. He was like, \"Hey, by the way, I just searched...\" Because AWS, there's a way you can search every single code base that isn't security cleared or something, so anyone could see any implementation of EC2 or anything like that. And I guess he just searched Tonic and he found that Lambda was using it, and was like, \"Okay, that's interesting.\" This is two months after I first released it. So maybe six months after I was like, \"Hmm, maybe I should start working on this.\" That following March I actually reached out to him and I was like, \"Hey, is Lambda still using it?\" He's like, \"Yeah.\" I was like, \"All right, intro me to the manager.\" And boom, I ended up joining AWS Lambda and helping them build their new Rust service, and eventually transitioned into doing a lot more Rust stuff at AWS in general. But it worked that easily. I was impressed.\n\nEric Anderson:\nTrojan horse almost, you write this thing, they have to use it, and then you have to work there so that they can maintain it.\n\nLucio Franco:\nAnd what manager's going to tell you no, your entire networking stack, I can solve your problems.\n\nEric Anderson:\nSo you're working on Rust at AWS, but not necessarily databases, and then you find your way to Turso?\n\nLucio Franco:\nYeah, so I got caught in the layoffs. I was working on Rust tooling and open source, so I think I was one of the first people on the list to get laid off. I wasn't working on any moneymaking things. I was like, \"The layers from the moneymaking was quite far away.\" And, again, it's probably a good segue into the open source funding thing, but when you work on open source, from the VPs making the choices about where to put money, you're so far away. They don't think about it. They take it for granted. And this is a huge problem in the corporate world with open source.\n\nIt's a fog of war, they don't see far enough down the software supply chain of what's powering everything, the bites operating on the CPU or whatever, they're not really thinking that far. They're just thinking about my customers directly. And so unfortunately I got caught in that crossfire. And good timing anyways, I was ready to leave, and I wanted to work on something. And this probably also segues into another talk about the mental health aspect of being an open source maintainer. And I'm, again, very happy with my choices and being at Turso now. And I'm finally actually working on a database, so that's the huge positive here.\n\nEric Anderson:\nBefore we go to that, how does the funding work for Tokio? So you got this somewhat of an organization of volunteers building libraries that they all are individually passionate about and it works, how does it all work?\n\nLucio Franco:\nThat's an extremely good question. For a while, a couple of us were being paid to work on Tokio. Carl works at AWS still and he has been working there maintaining Tokio and working with people inside the company to help them with Tokio. Sean has also worked at AWS and now is doing his own thing. He's the maintainer of hyper, which is the HTTP library, implements all the nice, crazy fun stuff around HTTP, but I think most of the other people are not paid full time to do this. A lot of it's either in our free time or work has given us some time to work on it. When I was on Lambda, I spent, and I'm going to put in quotes, 50% of my time working on Lambda, 50% of my time working on Tonic and open source. Definitely very hard to juggle.\n\nBut most people were in this situation where they were doing it either in their free time and stuff like that. And if you look at some of the projects, since 2019, things have stagnated a little bit, and obviously as things get popular, they're very hard to manage. For example, Tonic right now, I actually... Personally, I don't have any funding coming in for Tonic. Turso has given me time to work on it. Obviously it's hard to juggle startup priorities with open source library that is not as critical. Obviously I'm looking out for security bugs or anything that could be causing issues, and I'm going to respond to that as fast as possible. That's not a problem, but features, and I've wanted to re-implement certain things, I don't like how it's done, I just haven't had the time to do that. But there isn't much funding.\n\nAnd in fact also we have some money that has been given to us through things like Open Collective, but we really haven't found a way to spend that money. It's been hard to find people to contract to because in reality the bar to contribute and to spend our money on somebody is quite high. To be able to get into that space, you need to show that you're able to do that. And so that barrier is already very difficult to overcome. And so finding people that are willing to do that and actually will stick with it is incredibly challenging. So in reality, besides people that are being paid to do it for work, there isn't much people really being paid to do stuff like that, which is unfortunate. And another thing for me at least is I haven't even set up GitHub sponsors for myself.\n\nAnd another aspect of open source that's quite challenging is the guilt involved with it. I have the push rights to Tonic, I can publish things. And people will open up bugs that maybe it's blocking them from achieving something, and I don't have the time nor energy to always help them out, because sometimes it requires a lot of work. The quality of bar of contributors is all over the place, so sometimes you have to guide through users, sometimes users drop 2009 PR on you and are like, \"Hey, can you review this?\" I'm like, \"Absolutely not. That's a lot.\" So one thing I've had to learn is how to say no, I've gotten very good at it now, but things like, \"Hey, this is a lot. I don't have the time for this.\" But I feel guilty, so I actually don't accept money because if I accepted money, I would feel even worse.\n\nI would feel like I would need to actually spend more time, more of my free time, more of my mental energy on it that I currently necessarily don't want to spend on. I've come to the point in my life where I want to set a certain amount of time a week to work on coding, and there's other priorities in life. So back when I was younger, I was able to come from work and code all night, but now I don't have that same interest or motivation, and so I have to have this boundary I put between the people there a little bit because I don't want to feel guilty and bad. I can say, \"You're not paying me.\" Yes, if you want to get a release out, you do a lot of the heavy lifting and work, I will help get it through.\n\nBut if you're asking me to do something, I'm going to tell you, \"Look, I just purely don't have time.\" And actually this happened literally today. Someone came in and was like, \"Hey, is there any plans to implement this?\" I'm like, \"No, I currently don't have the time nor energy to do this.\" If you are interested in fixing it, come on, be a maintainer, come talk to me. And in fact, I've been looking for maintainers for a very long time and I've had some people come in, work on stuff. I've had one maintainer come in, but it turned out that actually pushing him to work on a different project, similar what I did with Tonic, I had him work on a HTTP library, like Django style library that allows you to... It's not HTTP, but a HTTP server express style library for Rust that we were really lacking, and I saw this really good opportunity for us.\n\nAnd so I was like, \"Hey, I know you're helping me a lot and I love having you help me and you're awesome, and I trust you, you have full rights to everything, but actually I think you're better doing this, championing this and running with it.\" He did that and now the project, which is Axom, is a wildly even more popular than Tonic, and I'm happy with that, but the downside is that I lost a collaborator because he doesn't have the time to spend time on both. So it's like I found somebody, but then the opportunity presented itself and I lost them. And in fact, I pushed them, I created the repo I was like, \"Hey, go talk to Carl. Go implement this. Yes, go, go, go, go, go.\" So it wasn't like I was sad, sad, I'm still happy with the outcome, but since then I've not been able to find anyone to really stick with it and help me. Still been just me maintaining these things, and it's a lot of work.\n\nEric Anderson:\nLucio, you mentioned earlier there was some mental health aspect to open source that you feel like is fairly pervasive, it sounded like. Tell us what that is.\n\nLucio Franco:\nWhen you create an open source project, let's say for me with Tonic, I started writing it because, one, as I said earlier, I was doing a job in distribute databases and stuff, but also I was just genuinely curious. I was enjoying it. I was having fun writing a library, figuring these things out, looking into other libraries and seeing how they did stuff, and trying to massage it into Rust and make it right, writing blog posts and spending time on the README and making it look all pretty, I loved that. The first pass I did that was amazing. But as the years went on, and Tonic grew and matured, and more and more people started using it, so the frequency of issues being opened or people asking me questions on Discord started to increase, the burden started to become quite hard.\n\nI'm one person, I have a fixed amount of energy every week, and this thing is piling up on top of me and there's more people asking me stuff. And obviously as a project gets more complicated and the longer it exists, the harder it is maintaining it. There's this relationship where as you add more code, the burden of maintaining... Remember, the implicit dependencies of everything becomes very complicated. So the problems and the bugs to solve started to get much more complicated in a good and bad way, but it's the nature of software. But when you work at a company maintaining a piece of software, there's motivation. You have customers that are paying you and telling you, \"Hey, we love this. This is great.\"\n\nFor example, let's say, a B2B company, maybe you have a customer success team, and they're giving you feedback from the customers. They're saying, \"Hey, our customers really enjoyed this feature, good job,\" in the open source, what happens when I make a release? I spend all my energy and brain power cutting a release, writing the code, making sure everything's right, doing the change log, making sure everyone's happy, I click the button to publish, and then radio silence. If it's good, if the release has no bugs, radio silence, there's nothing. Did people download it? Did people try it?\n\nEric Anderson:\nYou're saying that most feedback is negative feedback.\n\nLucio Franco:\nRight, all you hear is this negative feedback. You don't get the positive feedback loops that trigger the dopamine release in your brain that make you excited and motivated like you would have at a company like your boss saying, \"Good job.\" That was really you make these releases, these bug fixes, you spend all this energy fixing these bugs and then it works, and no one tells you good job. It's quite a depressing experience and all you hear is these negative things. And then on top of that, because you are the maintainer, you are the only person to be able to review and push things through. And now someone opens a PR fixing a bug that a couple other people have, and then they comment on the PR saying, \"Hey, is someone going to review this?\" And I'm like, \"Look, I am exhausted. I don't have the time for this. Maybe next week I can look at this,\" but I feel bad.\n\nInherently, I want to help these people, I have an internal want to be helpful, and so it really pains me to have to leave these people hanging. And it gets worse, we have some projects that issues are piled up like crazy, and the surface area is large that they need to work on, so there's a lot of stuff going on, and the overhead is really high. And you pair that with not being paid, say, a wage. The work that a lot of these open source people do is the work of senior engineers. We did this analysis when I was at AWS about the quality of the code these people are writing, what they have to think about.\n\nAnd if you were to compare them to the leveling at AWS, for example, it's like you're a senior engineer, if you're able to do this stuff and you're able to make these choices and talk to people in a certain way and do designs, especially in an async collaborative world without someone being able to help you or another engineer helping you, that is truly the work of a senior engineer, but these people are being paid $0. It's quite crazy, the work they're doing is worth hundreds of thousands of dollars and they're being paid maybe they get a thousand dollars through sponsors or something. It's quite crazy this disparity. And that plus the guilt and plus other priorities in life turn you into burnout. They make you fall in this pit where you just don't know what to do. You're paralyzed.\n\nEric Anderson:\nWe talk about community, we use that word all the time, there's a big community around Tonic, and maybe it would help to understand who these people are actually, most of them are users of the library and they're fairly transactional. They show up, they download, they run, and they file bugs when they don't work. And that's what? 90% of the community?\n\nLucio Franco:\nYeah, there's maybe two or three people on Discord. We tend to use Discord for all of the Tokio projects, and so there's a Tonic channel that I monitor. And occasionally we have some other people that we have a tag for some people like a group where people that are helpful people, they're not contributors, they don't have push access, but they are just people that have been answering questions here and there. And this is also how I got involved by the way. I started going on Gitter back in the day and was helping people. And as a maintainer, I'm like, \"Wow, I love that. I love when people come in and answer questions. If you don't know the answer, ping me, I'm happy, but please go and try to answer questions even if you're not confident. It's a good start.\" And that shows initiative.\n\nSo there's people that help out, but there's maybe three or four. This is just for Tonic. For Tokio, there's a larger group of people. So the Tokio umbrella and there's the Tokio project or the actual code library, that surface area is massive, so there's a lot more of a community there because also the user segment that uses Tokio is everything, whereas Tonic is one slice of that pie. So beyond that, there's not much of maintainers helping people answering questions. It's usually just other people maybe running into a bug and maybe they found a solution. Beyond that, it's transactional, people being like, \"Hey, can I get some help for this?\" That's it.\n\nEric Anderson:\nAnd you can't just walk away because you have the keys to making updates. And as you mentioned, somebody's got a critical bug or a security issue, they want to push an update and you're blocking.\n\nLucio Franco:\nSo actually we have the permission set up in Tokio that if I were to get hit by a bus tomorrow, a large bus, people have access to rights to everything. It's part of an organization, there are escape patches, but people actively publishing new packages and stuff is just me mostly. Other people have rights, they know how to do it. For example, David, who works on Axom, the guy I mentioned earlier, he hasn't made a Tonic release in two years, so he's not really going to remember how to do all the little things here and there. So it's just me managing it all. I bet somewhat poorly probably, but tried my best.\n\nEric Anderson:\nWhat's the end game? Eventually you hand this off to somebody? Do you just document how to do releases really well? I don't know. What have you seen in other open source projects? Is there a place?\n\nLucio Franco:\nI think the real end game is waiting for Rust 2.0 to come out, so my library becomes incompatible and someone else has to take the reins of maintaining a library. No, there is no end game. And actually I thought the same thing, how do I get out of this? There's no real way to do it. And the hope is that some maintainer comes by that helps out and will be able to take charge. But the real solution actually is to get the library to a 1.0 state where it's just bug fixes. What this means for Tonic right now, it's probably I need to go through with an and start chopping off things. There's some things that I implemented in 2019 that made a lot of sense with the idea that we're the first library of the market, so I have to do a lot more heavy lifting to make up for the lack of other libraries.\n\nSo there's the gRPC stuff, but then there's a lot of load balancing and HTTP server client configuration stuff, I wrote a layer on top to make it easier, but now it's a lot easier to do it by hand, you could put the pieces together, but before it was not the case. So part of the end game is cutting surface area, this is a huge strategy in how to achieve 1.0, really just cut out all the surface area, maybe put it into a unstable crate that's not 1.0, and then just have a very solid 1.0. And that's really the end goal. The gRPC protocol is not going to change, the HTTP/2 protocol is not going to change. What we have now is pretty solid. It's very production ready, it's being used quite heavily all over the world, so I'm pretty confident in the code. It's more of just reducing surface area, but the problem with that is it's a lot of work to go and cut things out, because I have to write docs and update things. So it's a catch 22.\n\nEric Anderson:\nFrom your experience at Amazon, what percent of big tech engineers, or an open source maintainer, have an open source side hustle that relies on them?\n\nLucio Franco:\nFrom my perspective at Amazon, it's a very low percentage. Very, very low percentage. But I think that's a problem with a company like Amazon or Microsoft or Google in essence a little bit, but I think the Google personality varies from those two. Amazon has a very strong college pipeline, so you have a lot of people coming straight from college going into a pretty good, cushy tech job, working in some pretty bespoke technology. The build system is unique, the libraries they're using are unique. This is a big shift. Rust made a big shift at Amazon because a lot of the previous libraries they use, everything's in Java, so they already had handwritten every library you could ever want. You didn't have to go outside the little bubble, everything just worked. But for us, that was not the case.\n\nThat's why people started using Tonic, because they needed a library to do something now. You can't wait for a team to implement this in 1.0, but most people are not writing Rust, most people are writing Java or whatever at Amazon because it's a very large company. Those people are not really exposed to the outside world, so there's no point for them to do open source stuff. A lot of them don't. The job is already taxing enough, they're not going to really go out of their bubble. I think people at startups and companies where you do have to pull tools off the shelf, you do have to go on GitHub and see what other options there are. If I wanted to implement something for Turso, for example, and I didn't do my due diligence to go check for a Rust library that might do exactly what I want, that's really dumb. I should be finding the fastest way possible to achieve something. So that process also leads into people contributing back.\n\nYou'll see a lot of startups have GitHub organizations where they're like, \"Hey, we open source this little library that we used to do this thing and it really helped us.\" And that can spark a little bit of that open source creativity, innovation and putting things out there that a large company doesn't really push. Obviously, there's going to be exceptions. There's going to be people that are curious and just want to do stuff. Like me, I was doing it all on my own, wasn't related to work, somehow found its way back into work, but there's a lot of people in this large company that are not really exposed to this type of stuff, and they're not exposed to the risks either. That's the other thing. I spent a lot of time at AWS thinking about what should we depend on. If I could choose a set of libraries that everyone should depend on, how do we make those decisions? And most people don't think about that. They just choose. It's good until things break and you have a zero day or something, but it's always a risky.\n\nEric Anderson:\nLucio, this has been super fascinating. Not only all the details on Tonic and Rust, but your personal experience navigating being an open source maintainer. Tell us, as we wrap up here, what folks can do if they're excited to learn more about Tonic or want to take the load off your shoulders.\n\nLucio Franco:\nWell, if you're interesting in Tonic at all, come check out the project, try it out. There's a bunch of GitHub issues, there's Discord, getting involved is pretty easily. I'm available, you can find me on Discord, you'll see my green name show up. Play around, respond to some issues, review a PR. I will never stop anyone from providing their opinion on something. And actually I encourage people to act as if they're a maintainer without having the privilege of being a maintainer. Go ahead and review something as if you were the one trying to merge the PR. That generates a lot of confidence in us, I love seeing the initiative. And, again, I have an email, I have Twitter, X, I have Discord, reach out to me, come ask me questions. I'm always happy to talk and discuss these sort of things. I was a budding open source maintainer once upon a time not too long ago, so I understand very well what it's like to try to get into this space.\n\nEric Anderson:\nWell, also, we appreciate your service, Lucio. You've given something back to the world.\n\nLucio Franco:\nIt's the least I can do.\n\nEric Anderson:\nYou can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson, and this has been Contributor.","content_html":"

Tonic is a native gRPC implementation in Rust that allows users to easily build gRPC servers and clients without extensive async experience. Tonic is part of the Tokio stack, which is a library that provides an asynchronous runtime for Rust and more tools to write async applications. Today, Lucio Franco (@lucio_d_franco) of Turso joins the podcast to discuss his unique experience maintaining Tonic and contributing to the asynchronous Rust ecosystem.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People:

\n\n\n\n

Other episodes:

\n\n\n\n

TRANSCRIPT

\n\n

Lucio Franco:
\nThe VP's making the choices about where to put money, they don't think about it. They take it for granted. And this is a huge problem in the corporate world with open source. It's like a fog of war, they don't see far enough down the software supply chain of what's powering everything.

\n\n

Eric Anderson:
\nThis is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Today we're on air with Lucio Franco, who is part of the team over at Turso. We talked to Glauber maybe a few episodes back. And we're going to talk about Rust and some emerging standards, and in particular a project called Tonic, part of the Tokio broader family. Lucio, tell us what Tonic is real quick. We'll see if we can encapsulate that in a elevator pitch and then we'll talk about the whole story.

\n\n

Lucio Franco:
\nTonic is a gRPC library at the base. It provides your client server, your basic code gen. The library is pretty simple, it just allows you to talk gRPC with any other language, because gRPC is a multi-language protocol, so it's pretty efficient. Tonic itself, some of its higher level goals have always been around being super ergonomic and efficient. So if you don't have any history around async Rust in the past, it's quite challenging to work with. It's improved a lot over the last five years, but when it first started it was quite hard to work with. So Tonic is essentially like I want to write a networking server, a client server, or talk to a server or client, but I want to do it really easily. I want to get started and not have to worry about the hard thing. So that's essentially Tonic's goal of a project is to enable those types of users that are somewhat new to the asynchronous world.

\n\n

Eric Anderson:
\nAnd I don't know the right order to do this, Lucio, your background or Rusts background, or maybe you know a way to intertwine them, but in researching our conversation, I didn't realize there was a lot of history around async with Rust and there's not a standard way of doing this, and so the community has solved this in a few ways. Tokio, which Tonic is a part of, is an umbrella brand for a set of tools that all are going to work, interop well together.

\n\n

Lucio Franco:
\nWe can back up a little bit. Rust 1.0 in August 2015, July, something like that, so it's a very young language. And in effort to standardize or 1.0 the language to make it available to people, there was a choice to reduce the size of the standard library. The less surface area that you have to ship, the easier it is to ship, so that's the idea. Let's make package managers really easy compared to something like C++ or getting dependencies inside is very difficult. Rust is like, "Let's make that very easy and then let users innovate on their own and not have to deal with this whole RFC process to get stuff into the standard library that is, one, extremely slow and brain drain. It's exhausting." So Rust at the core has always started with a small standard library.

\n\n

For example, a language like Go has a lot of stuff shipped with it, it has an HTTP implementation, it has green threading implementations that you can just take the standard library and just run with and actually built some pretty good stuff out of the box. Rust did not take that approach. Rust let you push that onto users. Tokio itself, there's two things here going on in the early stages of async Rust or the story of asynchronous networking in Rust. There's the design of futures, which would be the idea of how do we execute anything asynchronously, which was being driven by some compiler folks or library folks in Rust. And then on the side there was, okay, how do we leverage these interfaces and build something off of it that is actually async? And this is where Tokio came into play. Around 2016 ish, I think, was when they started developing Tokio a little bit. And it started off as a third party library.

\n\n

If it works, why does it need to be a part of the standard library? Because putting stuff into the standard library is a lot of work. There's a lot of bureaucracy involved with it, which I completely agree we should have, because stuff that goes into the standard library is stamped and shipped to everyone in the world. So being able to do stuff outside of that enabled Tokio to innovate and push the boundaries of what was possible. And I think we're seeing a lot of the fruition of those effects in the modern day. Tokio is extremely popular now, it's being used at essentially almost every major tech company for something, and it's growing popular in the ecosystem as being a leader in this space.

\n\n

Eric Anderson:
\nMaybe I interrupt you here because I think you've given us some early history into Rust and Tokio. And now what's your story? Because at some point you intercepted with Tokio.

\n\n

Lucio Franco:
\nSo actually my origin story with Rust actually started in college. Before I got into networking and all of this distributed systems stuff, I was like, "I want to write game engines," and I failed so many times trying to get OpenGL linked into my C++ project, banging my head against the desk reading Stack Overflow screenshots of Xcode settings and having no clue what's going on. And then I stumbled across Go and I didn't like the Go path thing, and then I stumbled across Rust and I was like, "Wow, I can just cargo add OpenGL and things just work." And I was like, "This is amazing. This is clearly the best." So I fell in love with it then and there, Rust as a core, everything just made sense. I read the book and I was like, "Why aren't other people doing this? This is so awesome."

\n\n

And then obviously did my game stuff for a while, and as I was graduating I was like, "I need to do real stuff," because I didn't want to work in the game industry. And so I was like, "I love distributed systems and distributed databases, why not get involved?" Ironically now I work at a database company doing distributed database stuff, so if you think about my goals back then and where I've landed now, the path has worked out quite well for me. So I started learning Rust and then obviously I saw the future is in async networking stuff coming up, and I tried to play around with it for a while and I absolutely failed. It was not the easiest thing. The error messages coming from the compiler were quite confusing just using it in general, not having a clue what async networking was, I just knew it was a cool thing and it's high performance.

\n\n

So if I was wanting to write a database, this is the stuff I want to use. So a couple of years went by, I worked in a different language and I started... Honestly, I don't know if you guys remember Gitter, it's a chat platform for GitHub, so I get a project to create a chat platform. And so me being curious, I just was poking around the Tokio stuff and I ended up going on Gitter and talking with Carl who's the author of Tokio. And we were talking back and forth a little bit. And that's basically how I got involved in Tokio was just honestly letting Carl talk to me about stuff. And I had no idea what was going on, but I was curious and wanted to learn. So I got myself involved and I think, instead of working, I was doing some open source stuff here and there, and it wasn't part of my full-time job.

\n\n

That's how I got involved in the open source thing, but Tonic itself, the origin story is actually interesting because continuing on this path of seeing this projects and I started going to Carl, I was like, "Hey, where can I help?" Obviously, at the time I didn't really know what epoll was, I had some idea, but epoll's the Linux library that essentially Tokio wraps around, and that's the core API that powers everything asynchronous. And so I didn't even know the concepts of this, so I didn't really know what I was doing, but I wanted to learn. So I asked, I was like, "What are some space that I can help out with?" And it turned out that there was some experimental projects done with Tower, which is a middleware library. And it had been written, but it needed a lot of polish, as with everything. At this stage of Rust, everything was very new.

\n\n

And so I started playing around with that a little bit. And then I saw this project that was called Tower gRPC, and I'd seen that Carl had actually written this. He was at the time working at a company called Buoyant writing a proxy that needed to do some gRPC stuff in Rust, and so he had written code. But as you are with any startup, when you write code, you're not writing code to release it to people. You're writing code into your product and, all right, it works, we're done, let's focus on the next priority, because it's a startup, so this thing was like this half hacked state, it worked, but it was the definition of unergonomic, not Carl's fault obviously, but at the state of where things were at, it was quite difficult.

\n\n

So I decided if I wanted to work in databases and networking, that was the end goal, why don't I take this project and try to make it better? And roughly this is at the time when async functions were coming around. So we're going from handwriting state machines that were really complicated and extremely error-prone, extremely, extremely error-prone, to writing really nice generators that did a lot of the heavy lifting. So we were making this big jump, not just the Tokio people, the Tokio people actually this is all compiler stuff. So this is like the people in the compiler working on that singular futures interface, improving how people can write stuff from the compiler perspective. So for us it was a target of, "Hey, we have this new feature shipping in Rust that's going to really change the game. Really, really change the game. Let's ship something along with this."

\n\n

So we were planning some releases of Tokio, and I was like, "You know what? Now is the time, now is my window to write a library that, one, can be really ergonomic and easy to use." So thinking back to me in college and I was to play around with some RPC libraries, I fell flat, I had no idea how to do this, how could I improve upon that experience? How could I make this easier for users? Not just discovery, but documentation examples, just how do we reduce the foot guns, how do we make it so you don't dig yourself into a hole that you didn't think was possible and then you couldn't get out of it? And so that was my main focus of time was how can we really change the game here? And so I spent a lot of wakeless nights, no, I slept, but a lot of nights after work coding, trying to get this done and shipped, and working on airplanes, and actually I found that airplanes was my best spot to focus on, because with my ADHD there was nothing else to do but code, so it was perfect.

\n\n

And I got it shipped basically just in time for async/await to ship as well. And so technically Tonic was the first product, and I say product, I'm not really selling anything, but it was the first library on the market or in the Rust ecosystem that was taking advantage of this new async/await technology, was working, and basically production ready, and that was the goal is to lease this all at one time and then iterate upon it, and take lessons learned and stuff. Looking back at it now, in the last four years, the original design actually worked quite well and yet to hear people complain about stuff, which is either a scary or a very positive thing, but there's definitely some hidden things in there that I would love to fix in the future that I always put off to the end, but overall quite happy with the progress the library has made at this point.

\n\n

Eric Anderson:
\nMaybe, Lucio, just to dig into a couple of things from that experience, is this your vision or is it Carl's vision?

\n\n

Lucio Franco:
\nThe real [inaudible 00:10:32] for Tonic was I wanted to work in distributed databases. So I was like, "What's the best way to get a job working in distributed databases?" It is to write the protocol that they're using. You go to an employer and you say, "Hey, you're literally using my code. You depend, your entire product, on me, hire me." So that was my vision, that was my motivation. Obviously, I think the goal that Carl was working on in general and the others in the Tokio group were really good, and I think we pushed for some similar fronts, but I think that there's, what I was pushing for, that ulterior thing for a job, but the goal of making things more ergonomic and easier was really important. And actually the third thing I think that was really critical to Tokio's current success that we really focused on was shipping.

\n\n

A big problem that happens in the open source community, and we see this a lot with Rust. Most of my exposure has been to Rust, so I can't really speak to a lot of other languages and communities, but Rust, we have this decision paralysis. And the same thing with the standard library versus non-standard library discussions. If you want to put something into the compiler, it gets drilled down on you're going to have to answer a million questions, people coming left and trying to make decisions on, oh, what if we did it this way? We want to support this. What about that? And that, one, becomes very exhausting, but it doesn't allow you to ship. That's why things, for example, like the original async/await shipped four years ago, but we still don't have interfaces that are async/await. I think it's shipping now, but it's quite a time later.

\n\n

This is just purely due to the politics that happens in open source. And Tokio took a different approach. Luckily we were all amicable to each other, so we all liked each other, there was no drama, but it was like, "Hey, why don't we not spend our time always discussing and going in circles, let's get something out there that people can use so people can build stuff with." Rust is not going to go anywhere if people are not able to actually build stuff, and people from any level. I'm sure someone who's very experienced could take some of the older stuff that was really difficult to work with and build something, but we have to support people who are mid-level and juniors, and we want to grow into that space like Go has taken really a strong advantage here of being really easy to work with.

\n\n

So part of our Tokio's goal was to ship an easy to use ergonomic library but ship it. And that was the same goal with Tonic. That's why I was like, "Let's get it out in time for async/await, so everyone can, the day async/await lands, boom, what are you going to choose as your gRPC library? Tonic." So we were pushing on all fronts. I think we all had agreed that that was the right way to go. We didn't want to run in circles any longer. And so that's I think been a huge advantage of the Tokio group of us really not trying to foster drama or any power plays or anything like that, because open source tends to have a lot of power vacuums and people trying to take over.

\n\n

Eric Anderson:
\nMaybe help me understand one aspect of this and that's that... I don't know if runtime is the right word, but this fundamental aspect of the program language that's not included in a standard library, async/await, when you write your gRPC, you write to a single approach to async/await, even though there may be others, and so Tonic can only work on Tokio's async/await, is that right?

\n\n

Lucio Franco:
\nIt's a little bit more complicated. So the async/await is just like syntactical sugar in the compiler to convert to some internal types. Think of it as like a transformer. In reality it is exactly a transformer. It transforms into a type that implements future. Future is some interface of something that can run asynchronously, and pause and resume. Those are all in the standard library. Finally, future made it into the standard library after a few years, but yes, that all made it to the standard library, Tokio itself just uses the future interface and a couple global variables to hook into your thing to notify you that you're ready, whether it be a timer or whether it be your epoll thing saying, "Hey, your file descriptor is not ready to read from because there's bites waiting in the TCP buffer," that stuff is tied to Tokio.

\n\n

For example, if I use a TCP stream type from Tokio, it needs to run inside the Tokio runtime, and this is because the TCP stream type needs to know how to register itself with the API that Tokio provides. And there's no way around this. It's very hard to abstract that, or abstract it in an ergonomic way, because those APIs are so tightly ingrained with what it's doing in the scheduler that it doesn't really make much sense. So, yes, there's been this kind of issue in the Rust async ecosystem where you can't mix and match runtimes and swap out runtimes, and it makes it very challenging. It ruins your programmer mental model that you'd like to have where things are abstracted and I can just change one line and now I'm using a different runtime. In reality, the argument for that too is the use cases are not very high.

\n\n

Yes, you have an embedded system, maybe you want a different runtime, completely agree, go use an embedded system runtime from the start. Tokio's net was never intended to work in that space. But spending all this time on abstracting things when, for example, we're using epoll in Tokio, but remember at the time that we were talking about all this abstract and runtime stuff, io_uring was coming around. If you don't know what io_uring is, it's essentially a new Linux API that allows you to dispatch assist calls asynchronously right in the kernel. So it's a really efficient way to do it, but it completely changes the model of how you do IO. From having to send the data into the kernel or just passing a pointer and being able to read through it, it makes it much more complicated. So then how can you abstract something that is still being innovated?

\n\n

It's the same question about the standard library. If we haven't figured out the right abstractions yet for how to do epoll in Tokio, how the heck are we going to be able to put that into the standard library and convince people that this is going to be the implementation that we want for the next 10, 15, 20 years? It's not possible. And, in fact, for Tokio, we didn't really think about this. We didn't care about the abstraction because we wanted to get something that people could use. We weren't thinking about how could users swap it out in and out. It wasn't a play to be having monopoly whatsoever. Not at all. It was this isn't beneficial to achieving our current goals of getting this in the hands of users so they can build stuff.

\n\n

That was primarily our number one goal about users not writing the most perfect system, not about writing the best thing ever, but about getting something that people can use and build real systems with. Because, in reality, it's how you bring a language forward. You can't just stay academic about everything. You need to be a little bit more pushy on getting things in people's hands. It's like doing a startup without the funding and all the stuff like that obviously.

\n\n

Eric Anderson:
\nWell, actually that's a good segue. We will get into funding and economics related things in a moment. So this worked out for you, your plan of I'm going to work on the gRPC implementation and then I'll get hired by a database company.

\n\n

Lucio Franco:
\nIn fact, actually I had a friend, he was working at AWS at the time. He was like, "Hey, by the way, I just searched..." Because AWS, there's a way you can search every single code base that isn't security cleared or something, so anyone could see any implementation of EC2 or anything like that. And I guess he just searched Tonic and he found that Lambda was using it, and was like, "Okay, that's interesting." This is two months after I first released it. So maybe six months after I was like, "Hmm, maybe I should start working on this." That following March I actually reached out to him and I was like, "Hey, is Lambda still using it?" He's like, "Yeah." I was like, "All right, intro me to the manager." And boom, I ended up joining AWS Lambda and helping them build their new Rust service, and eventually transitioned into doing a lot more Rust stuff at AWS in general. But it worked that easily. I was impressed.

\n\n

Eric Anderson:
\nTrojan horse almost, you write this thing, they have to use it, and then you have to work there so that they can maintain it.

\n\n

Lucio Franco:
\nAnd what manager's going to tell you no, your entire networking stack, I can solve your problems.

\n\n

Eric Anderson:
\nSo you're working on Rust at AWS, but not necessarily databases, and then you find your way to Turso?

\n\n

Lucio Franco:
\nYeah, so I got caught in the layoffs. I was working on Rust tooling and open source, so I think I was one of the first people on the list to get laid off. I wasn't working on any moneymaking things. I was like, "The layers from the moneymaking was quite far away." And, again, it's probably a good segue into the open source funding thing, but when you work on open source, from the VPs making the choices about where to put money, you're so far away. They don't think about it. They take it for granted. And this is a huge problem in the corporate world with open source.

\n\n

It's a fog of war, they don't see far enough down the software supply chain of what's powering everything, the bites operating on the CPU or whatever, they're not really thinking that far. They're just thinking about my customers directly. And so unfortunately I got caught in that crossfire. And good timing anyways, I was ready to leave, and I wanted to work on something. And this probably also segues into another talk about the mental health aspect of being an open source maintainer. And I'm, again, very happy with my choices and being at Turso now. And I'm finally actually working on a database, so that's the huge positive here.

\n\n

Eric Anderson:
\nBefore we go to that, how does the funding work for Tokio? So you got this somewhat of an organization of volunteers building libraries that they all are individually passionate about and it works, how does it all work?

\n\n

Lucio Franco:
\nThat's an extremely good question. For a while, a couple of us were being paid to work on Tokio. Carl works at AWS still and he has been working there maintaining Tokio and working with people inside the company to help them with Tokio. Sean has also worked at AWS and now is doing his own thing. He's the maintainer of hyper, which is the HTTP library, implements all the nice, crazy fun stuff around HTTP, but I think most of the other people are not paid full time to do this. A lot of it's either in our free time or work has given us some time to work on it. When I was on Lambda, I spent, and I'm going to put in quotes, 50% of my time working on Lambda, 50% of my time working on Tonic and open source. Definitely very hard to juggle.

\n\n

But most people were in this situation where they were doing it either in their free time and stuff like that. And if you look at some of the projects, since 2019, things have stagnated a little bit, and obviously as things get popular, they're very hard to manage. For example, Tonic right now, I actually... Personally, I don't have any funding coming in for Tonic. Turso has given me time to work on it. Obviously it's hard to juggle startup priorities with open source library that is not as critical. Obviously I'm looking out for security bugs or anything that could be causing issues, and I'm going to respond to that as fast as possible. That's not a problem, but features, and I've wanted to re-implement certain things, I don't like how it's done, I just haven't had the time to do that. But there isn't much funding.

\n\n

And in fact also we have some money that has been given to us through things like Open Collective, but we really haven't found a way to spend that money. It's been hard to find people to contract to because in reality the bar to contribute and to spend our money on somebody is quite high. To be able to get into that space, you need to show that you're able to do that. And so that barrier is already very difficult to overcome. And so finding people that are willing to do that and actually will stick with it is incredibly challenging. So in reality, besides people that are being paid to do it for work, there isn't much people really being paid to do stuff like that, which is unfortunate. And another thing for me at least is I haven't even set up GitHub sponsors for myself.

\n\n

And another aspect of open source that's quite challenging is the guilt involved with it. I have the push rights to Tonic, I can publish things. And people will open up bugs that maybe it's blocking them from achieving something, and I don't have the time nor energy to always help them out, because sometimes it requires a lot of work. The quality of bar of contributors is all over the place, so sometimes you have to guide through users, sometimes users drop 2009 PR on you and are like, "Hey, can you review this?" I'm like, "Absolutely not. That's a lot." So one thing I've had to learn is how to say no, I've gotten very good at it now, but things like, "Hey, this is a lot. I don't have the time for this." But I feel guilty, so I actually don't accept money because if I accepted money, I would feel even worse.

\n\n

I would feel like I would need to actually spend more time, more of my free time, more of my mental energy on it that I currently necessarily don't want to spend on. I've come to the point in my life where I want to set a certain amount of time a week to work on coding, and there's other priorities in life. So back when I was younger, I was able to come from work and code all night, but now I don't have that same interest or motivation, and so I have to have this boundary I put between the people there a little bit because I don't want to feel guilty and bad. I can say, "You're not paying me." Yes, if you want to get a release out, you do a lot of the heavy lifting and work, I will help get it through.

\n\n

But if you're asking me to do something, I'm going to tell you, "Look, I just purely don't have time." And actually this happened literally today. Someone came in and was like, "Hey, is there any plans to implement this?" I'm like, "No, I currently don't have the time nor energy to do this." If you are interested in fixing it, come on, be a maintainer, come talk to me. And in fact, I've been looking for maintainers for a very long time and I've had some people come in, work on stuff. I've had one maintainer come in, but it turned out that actually pushing him to work on a different project, similar what I did with Tonic, I had him work on a HTTP library, like Django style library that allows you to... It's not HTTP, but a HTTP server express style library for Rust that we were really lacking, and I saw this really good opportunity for us.

\n\n

And so I was like, "Hey, I know you're helping me a lot and I love having you help me and you're awesome, and I trust you, you have full rights to everything, but actually I think you're better doing this, championing this and running with it." He did that and now the project, which is Axom, is a wildly even more popular than Tonic, and I'm happy with that, but the downside is that I lost a collaborator because he doesn't have the time to spend time on both. So it's like I found somebody, but then the opportunity presented itself and I lost them. And in fact, I pushed them, I created the repo I was like, "Hey, go talk to Carl. Go implement this. Yes, go, go, go, go, go." So it wasn't like I was sad, sad, I'm still happy with the outcome, but since then I've not been able to find anyone to really stick with it and help me. Still been just me maintaining these things, and it's a lot of work.

\n\n

Eric Anderson:
\nLucio, you mentioned earlier there was some mental health aspect to open source that you feel like is fairly pervasive, it sounded like. Tell us what that is.

\n\n

Lucio Franco:
\nWhen you create an open source project, let's say for me with Tonic, I started writing it because, one, as I said earlier, I was doing a job in distribute databases and stuff, but also I was just genuinely curious. I was enjoying it. I was having fun writing a library, figuring these things out, looking into other libraries and seeing how they did stuff, and trying to massage it into Rust and make it right, writing blog posts and spending time on the README and making it look all pretty, I loved that. The first pass I did that was amazing. But as the years went on, and Tonic grew and matured, and more and more people started using it, so the frequency of issues being opened or people asking me questions on Discord started to increase, the burden started to become quite hard.

\n\n

I'm one person, I have a fixed amount of energy every week, and this thing is piling up on top of me and there's more people asking me stuff. And obviously as a project gets more complicated and the longer it exists, the harder it is maintaining it. There's this relationship where as you add more code, the burden of maintaining... Remember, the implicit dependencies of everything becomes very complicated. So the problems and the bugs to solve started to get much more complicated in a good and bad way, but it's the nature of software. But when you work at a company maintaining a piece of software, there's motivation. You have customers that are paying you and telling you, "Hey, we love this. This is great."

\n\n

For example, let's say, a B2B company, maybe you have a customer success team, and they're giving you feedback from the customers. They're saying, "Hey, our customers really enjoyed this feature, good job," in the open source, what happens when I make a release? I spend all my energy and brain power cutting a release, writing the code, making sure everything's right, doing the change log, making sure everyone's happy, I click the button to publish, and then radio silence. If it's good, if the release has no bugs, radio silence, there's nothing. Did people download it? Did people try it?

\n\n

Eric Anderson:
\nYou're saying that most feedback is negative feedback.

\n\n

Lucio Franco:
\nRight, all you hear is this negative feedback. You don't get the positive feedback loops that trigger the dopamine release in your brain that make you excited and motivated like you would have at a company like your boss saying, "Good job." That was really you make these releases, these bug fixes, you spend all this energy fixing these bugs and then it works, and no one tells you good job. It's quite a depressing experience and all you hear is these negative things. And then on top of that, because you are the maintainer, you are the only person to be able to review and push things through. And now someone opens a PR fixing a bug that a couple other people have, and then they comment on the PR saying, "Hey, is someone going to review this?" And I'm like, "Look, I am exhausted. I don't have the time for this. Maybe next week I can look at this," but I feel bad.

\n\n

Inherently, I want to help these people, I have an internal want to be helpful, and so it really pains me to have to leave these people hanging. And it gets worse, we have some projects that issues are piled up like crazy, and the surface area is large that they need to work on, so there's a lot of stuff going on, and the overhead is really high. And you pair that with not being paid, say, a wage. The work that a lot of these open source people do is the work of senior engineers. We did this analysis when I was at AWS about the quality of the code these people are writing, what they have to think about.

\n\n

And if you were to compare them to the leveling at AWS, for example, it's like you're a senior engineer, if you're able to do this stuff and you're able to make these choices and talk to people in a certain way and do designs, especially in an async collaborative world without someone being able to help you or another engineer helping you, that is truly the work of a senior engineer, but these people are being paid $0. It's quite crazy, the work they're doing is worth hundreds of thousands of dollars and they're being paid maybe they get a thousand dollars through sponsors or something. It's quite crazy this disparity. And that plus the guilt and plus other priorities in life turn you into burnout. They make you fall in this pit where you just don't know what to do. You're paralyzed.

\n\n

Eric Anderson:
\nWe talk about community, we use that word all the time, there's a big community around Tonic, and maybe it would help to understand who these people are actually, most of them are users of the library and they're fairly transactional. They show up, they download, they run, and they file bugs when they don't work. And that's what? 90% of the community?

\n\n

Lucio Franco:
\nYeah, there's maybe two or three people on Discord. We tend to use Discord for all of the Tokio projects, and so there's a Tonic channel that I monitor. And occasionally we have some other people that we have a tag for some people like a group where people that are helpful people, they're not contributors, they don't have push access, but they are just people that have been answering questions here and there. And this is also how I got involved by the way. I started going on Gitter back in the day and was helping people. And as a maintainer, I'm like, "Wow, I love that. I love when people come in and answer questions. If you don't know the answer, ping me, I'm happy, but please go and try to answer questions even if you're not confident. It's a good start." And that shows initiative.

\n\n

So there's people that help out, but there's maybe three or four. This is just for Tonic. For Tokio, there's a larger group of people. So the Tokio umbrella and there's the Tokio project or the actual code library, that surface area is massive, so there's a lot more of a community there because also the user segment that uses Tokio is everything, whereas Tonic is one slice of that pie. So beyond that, there's not much of maintainers helping people answering questions. It's usually just other people maybe running into a bug and maybe they found a solution. Beyond that, it's transactional, people being like, "Hey, can I get some help for this?" That's it.

\n\n

Eric Anderson:
\nAnd you can't just walk away because you have the keys to making updates. And as you mentioned, somebody's got a critical bug or a security issue, they want to push an update and you're blocking.

\n\n

Lucio Franco:
\nSo actually we have the permission set up in Tokio that if I were to get hit by a bus tomorrow, a large bus, people have access to rights to everything. It's part of an organization, there are escape patches, but people actively publishing new packages and stuff is just me mostly. Other people have rights, they know how to do it. For example, David, who works on Axom, the guy I mentioned earlier, he hasn't made a Tonic release in two years, so he's not really going to remember how to do all the little things here and there. So it's just me managing it all. I bet somewhat poorly probably, but tried my best.

\n\n

Eric Anderson:
\nWhat's the end game? Eventually you hand this off to somebody? Do you just document how to do releases really well? I don't know. What have you seen in other open source projects? Is there a place?

\n\n

Lucio Franco:
\nI think the real end game is waiting for Rust 2.0 to come out, so my library becomes incompatible and someone else has to take the reins of maintaining a library. No, there is no end game. And actually I thought the same thing, how do I get out of this? There's no real way to do it. And the hope is that some maintainer comes by that helps out and will be able to take charge. But the real solution actually is to get the library to a 1.0 state where it's just bug fixes. What this means for Tonic right now, it's probably I need to go through with an and start chopping off things. There's some things that I implemented in 2019 that made a lot of sense with the idea that we're the first library of the market, so I have to do a lot more heavy lifting to make up for the lack of other libraries.

\n\n

So there's the gRPC stuff, but then there's a lot of load balancing and HTTP server client configuration stuff, I wrote a layer on top to make it easier, but now it's a lot easier to do it by hand, you could put the pieces together, but before it was not the case. So part of the end game is cutting surface area, this is a huge strategy in how to achieve 1.0, really just cut out all the surface area, maybe put it into a unstable crate that's not 1.0, and then just have a very solid 1.0. And that's really the end goal. The gRPC protocol is not going to change, the HTTP/2 protocol is not going to change. What we have now is pretty solid. It's very production ready, it's being used quite heavily all over the world, so I'm pretty confident in the code. It's more of just reducing surface area, but the problem with that is it's a lot of work to go and cut things out, because I have to write docs and update things. So it's a catch 22.

\n\n

Eric Anderson:
\nFrom your experience at Amazon, what percent of big tech engineers, or an open source maintainer, have an open source side hustle that relies on them?

\n\n

Lucio Franco:
\nFrom my perspective at Amazon, it's a very low percentage. Very, very low percentage. But I think that's a problem with a company like Amazon or Microsoft or Google in essence a little bit, but I think the Google personality varies from those two. Amazon has a very strong college pipeline, so you have a lot of people coming straight from college going into a pretty good, cushy tech job, working in some pretty bespoke technology. The build system is unique, the libraries they're using are unique. This is a big shift. Rust made a big shift at Amazon because a lot of the previous libraries they use, everything's in Java, so they already had handwritten every library you could ever want. You didn't have to go outside the little bubble, everything just worked. But for us, that was not the case.

\n\n

That's why people started using Tonic, because they needed a library to do something now. You can't wait for a team to implement this in 1.0, but most people are not writing Rust, most people are writing Java or whatever at Amazon because it's a very large company. Those people are not really exposed to the outside world, so there's no point for them to do open source stuff. A lot of them don't. The job is already taxing enough, they're not going to really go out of their bubble. I think people at startups and companies where you do have to pull tools off the shelf, you do have to go on GitHub and see what other options there are. If I wanted to implement something for Turso, for example, and I didn't do my due diligence to go check for a Rust library that might do exactly what I want, that's really dumb. I should be finding the fastest way possible to achieve something. So that process also leads into people contributing back.

\n\n

You'll see a lot of startups have GitHub organizations where they're like, "Hey, we open source this little library that we used to do this thing and it really helped us." And that can spark a little bit of that open source creativity, innovation and putting things out there that a large company doesn't really push. Obviously, there's going to be exceptions. There's going to be people that are curious and just want to do stuff. Like me, I was doing it all on my own, wasn't related to work, somehow found its way back into work, but there's a lot of people in this large company that are not really exposed to this type of stuff, and they're not exposed to the risks either. That's the other thing. I spent a lot of time at AWS thinking about what should we depend on. If I could choose a set of libraries that everyone should depend on, how do we make those decisions? And most people don't think about that. They just choose. It's good until things break and you have a zero day or something, but it's always a risky.

\n\n

Eric Anderson:
\nLucio, this has been super fascinating. Not only all the details on Tonic and Rust, but your personal experience navigating being an open source maintainer. Tell us, as we wrap up here, what folks can do if they're excited to learn more about Tonic or want to take the load off your shoulders.

\n\n

Lucio Franco:
\nWell, if you're interesting in Tonic at all, come check out the project, try it out. There's a bunch of GitHub issues, there's Discord, getting involved is pretty easily. I'm available, you can find me on Discord, you'll see my green name show up. Play around, respond to some issues, review a PR. I will never stop anyone from providing their opinion on something. And actually I encourage people to act as if they're a maintainer without having the privilege of being a maintainer. Go ahead and review something as if you were the one trying to merge the PR. That generates a lot of confidence in us, I love seeing the initiative. And, again, I have an email, I have Twitter, X, I have Discord, reach out to me, come ask me questions. I'm always happy to talk and discuss these sort of things. I was a budding open source maintainer once upon a time not too long ago, so I understand very well what it's like to try to get into this space.

\n\n

Eric Anderson:
\nWell, also, we appreciate your service, Lucio. You've given something back to the world.

\n\n

Lucio Franco:
\nIt's the least I can do.

\n\n

Eric Anderson:
\nYou can subscribe to the podcast and check out our community Slack and newsletter at contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson, and this has been Contributor.

","summary":"","date_published":"2024-01-03T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/e9d6d332-b294-4920-bde0-ef9107a275d7.mp3","mime_type":"audio/mpeg","size_in_bytes":35012067,"duration_in_seconds":2184}]},{"id":"b3b497f9-2cc9-4d03-ac4f-04c34b12670c","title":"The Social Miracle: rqlite with Philip O’Toole","url":"https://www.contributor.fyi/rqlite","content_text":"rqlite is a lightweight, distributed relational database built on Raft and SQLite. Founder Philip O’Toole (@general_order24) decided to combine these technologies while working at a startup years ago. The startup no longer exists, but rqlite is going strong. Today, Philip is an engineering manager at Google, while he continues to be the driving force behind the open development of rqlite.\n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n The biggest misconceptions about how rqlite differs from SQLite\n\n Why writing databases is more interesting than new programmers might think\n\n The tradeoff between a large community versus smaller, more focused leadership\n\n Reasons why open-source development progresses in bursts of energy\n\n How to really pronounce “rqlite”\n\n\n\nLinks:\n\n\n rqlite\n\n InfluxData\n\n dqlite\n\n Litestream\n\n libSQL\n\n Turso\n\n OpenTelemetry\n\n\n\nPeople:\n\n\n Ben Johnson (@benbjohnson)\n\n\n\nOther episodes:\n\n\n libSQL with Glauber Costa\n\n","content_html":"

rqlite is a lightweight, distributed relational database built on Raft and SQLite. Founder Philip O’Toole (@general_order24) decided to combine these technologies while working at a startup years ago. The startup no longer exists, but rqlite is going strong. Today, Philip is an engineering manager at Google, while he continues to be the driving force behind the open development of rqlite.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-12-20T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/b3b497f9-2cc9-4d03-ac4f-04c34b12670c.mp3","mime_type":"audio/mpeg","size_in_bytes":41385944,"duration_in_seconds":2582}]},{"id":"a30dc9d9-ed47-4456-b9dd-ab37eee4fec1","title":"Community Driven IaC: OpenTofu with Kuba Martin","url":"https://www.contributor.fyi/opentofu","content_text":"\n\n\nKuba Martin (@cube2222_2) is Software Engineering Team Lead at Spacelift and Interim Tech Lead of OpenTofu, the open-source fork of Terraform. Terraform is a declarative infrastructure-as-code (IaC) tool that recently switched to a source-available license. Spacelift and other companies that heavily relied on Terraform came together to fork it into a community-driven project originally called OpenTF, which has now become OpenTofu and is governed by the Linux Foundation. \n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n Two kinds of forks\n\n How OpenTofu handled the opportunity to rethink their licensing and copyright\n\n Finding hundreds of pledges to the OpenTF Manifesto\n\n The benefits of a technical steering committee\n\n Recreating the community registry\n\n\n\nLinks:\n\n\n OpenTofu\n\n Spacelift\n\n Terraform\n\n Gruntwork\n\n Harness\n\n env0\n\n Scalr\n\n","content_html":"

\n

\n\n

Kuba Martin (@cube2222_2) is Software Engineering Team Lead at Spacelift and Interim Tech Lead of OpenTofu, the open-source fork of Terraform. Terraform is a declarative infrastructure-as-code (IaC) tool that recently switched to a source-available license. Spacelift and other companies that heavily relied on Terraform came together to fork it into a community-driven project originally called OpenTF, which has now become OpenTofu and is governed by the Linux Foundation. 

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2023-10-18T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/a30dc9d9-ed47-4456-b9dd-ab37eee4fec1.mp3","mime_type":"audio/mpeg","size_in_bytes":31936723,"duration_in_seconds":1991}]},{"id":"73b8c06b-7760-42e1-a4d2-898576f15b5e","title":"Postgres for Everything: Tembo with Ry Walker","url":"https://www.contributor.fyi/tembo","content_text":"Ry Walker (@rywalker) is the founder and CEO of Tembo, the Postgres developer platform for building any and every data service. To Ry, the full capabilities of Postgres appear underappreciated and underused for most users. Tembo is an attempt to harness the large ecosystem of Postgres extensions, and ultimately collapse the database sprawl of the modern data stack. \n\nContributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.\n\nSubscribe to Contributor on Substack for email notifications!\n\nIn this episode we discuss:\n\n\n Taking the “red pill” of using Postgres for everything\n\n Providing universal support for Postgres extensions\n\n Why Ry dislikes the current state of the modern data stack\n\n How databases across the board have mostly changed into application platforms\n\n What makes Tembo “Startup Mt. Everest”\n\n\n\nLinks:\n\n\n Tembo\n\n OSSRank\n\n Citus Data\n\n Modal\n\n Supabase Wrappers\n\n\n\nPeople mentioned:\n\n\n Erik Bernhardsson (@bernhardsson)\n\n\n\nOther episodes:\n\n\n Clickhouse with Alexey Milovidov and Ivan Blinkov\n\n","content_html":"

Ry Walker (@rywalker) is the founder and CEO of Tembo, the Postgres developer platform for building any and every data service. To Ry, the full capabilities of Postgres appear underappreciated and underused for most users. Tembo is an attempt to harness the large ecosystem of Postgres extensions, and ultimately collapse the database sprawl of the modern data stack.

\n\n

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

\n\n

Subscribe to Contributor on Substack for email notifications!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-09-13T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/73b8c06b-7760-42e1-a4d2-898576f15b5e.mp3","mime_type":"audio/mpeg","size_in_bytes":30844596,"duration_in_seconds":1923}]},{"id":"0efe0c32-7359-47b5-98d9-55050bbb399e","title":"Automation for Technical People: n8n with Jan Oberhauser ","url":"https://www.contributor.fyi/n8n","content_text":"Jan Oberhauser (@JanOberhauser) is the founder and CEO of n8n, the free and source-available workflow automation tool for technical users. n8n's flexible architecture allows users to avoid the limitations of other automation tools, while also opening doors for complex automation scenarios. The project has garnered over 30,000 GitHub stars and a thriving community of 55,000+ members.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How Jan’s background in film effects laid the groundwork for n8n\n\n Why n8n uses a forum over Discord or Slack for a community platform\n\n Use cases from scheduling fitness classes to upgrading financial mainframes\n\n How n8n might stack up against the well-thought out Python script\n\n Why n8n uses a fair-code license rather than open-source\n\n\n\nLinks:\n\n\n n8n\n\n n8n Community\n\n\n\nOther episodes:\n\n\n Temporal with Maxim Fateev\n\n From Orchestration to Building Applications: Conductor with Jeu George\n\n Rethinking the Workflow Problem: Windmill with Ruben Fiszel\n\n","content_html":"

Jan Oberhauser (@JanOberhauser) is the founder and CEO of n8n, the free and source-available workflow automation tool for technical users. n8n's flexible architecture allows users to avoid the limitations of other automation tools, while also opening doors for complex automation scenarios. The project has garnered over 30,000 GitHub stars and a thriving community of 55,000+ members.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-08-09T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/0efe0c32-7359-47b5-98d9-55050bbb399e.mp3","mime_type":"audio/mpeg","size_in_bytes":38533373,"duration_in_seconds":2404}]},{"id":"d598d296-2b60-461c-bf8e-963c122a1684","title":"The Big Fork: libSQL with Glauber Costa","url":"https://www.contributor.fyi/libsql","content_text":"Glauber Costa (@glcst) is the founder of Turso and the co-creator of libSQL, an open source, open contribution fork of the database engine library, SQLite. Most people believe that SQLite is open-source software, but it actually exists in the public domain and doesn’t accept external contributions. With their big fork, Glauber and his team have set out to evolve SQLite into a modern database with support for distributed data, an asynchronous interface, compatibility with WASM and Linux, and more.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n Community reactions to forking SQLite\n\n How Glauber was spoiled by starting his career developing for Linux\n\n The controversial decision to launch libSQL without writing a single line of code\n\n The plan for incorporating upstream changes from SQLite\n\n Examples of how application developers need to move code “to the edge”\n\n\n\nLinks:\n\n\n libSQL\n\n SQLite\n\n Turso\n\n LiteFS\n\n Litestream\n\n rqlite\n\n VLCN\n\n\n\nPeople mentioned:\n\n\n Avi Kivity (@AviKivity)\n\n Dor Laor (@DorLaor)\n\n Ben Johnson (@benbjohnson)\n\n Phillip O’Toole (@general_order24)\n\n Matt Tantaman (@tantaman)\n\n\n\nOther episodes:\n\n\n Scylla with Dor Laor \n\n Apache Cassandra with Patrick McFadin\n\n","content_html":"

Glauber Costa (@glcst) is the founder of Turso and the co-creator of libSQL, an open source, open contribution fork of the database engine library, SQLite. Most people believe that SQLite is open-source software, but it actually exists in the public domain and doesn’t accept external contributions. With their big fork, Glauber and his team have set out to evolve SQLite into a modern database with support for distributed data, an asynchronous interface, compatibility with WASM and Linux, and more.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-07-26T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/d598d296-2b60-461c-bf8e-963c122a1684.mp3","mime_type":"audio/mpeg","size_in_bytes":31120867,"duration_in_seconds":1940}]},{"id":"0943b880-bfe3-41f9-9645-a9bf8ae683f0","title":"Rethinking the Workflow Problem: Windmill with Ruben Fiszel","url":"https://www.contributor.fyi/windmill","content_text":"Ruben Fiszel (@rubenfiszel) is the creator of Windmill, the open-source developer platform that lets users easily turn scripts into workflows and internal apps with auto-generated UIs. Windmill doesn’t force engineers to change their coding style or adopt a convoluted API, and its low-code design makes it accessible to non-technical users. Tune in to find out how Windmill offers speed, performance and flexibility, while avoiding the limitations of rigid tools.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n Why many engineers try to reinvent the wheel when it comes to workflow engines\n\n When Ruben first saw the need for a platform like Windmill while working at Palantir\n\n “Today is the nicest period to build open-source…”\n\n Ruben’s incredible presence with support and bug fixes\n\n Windmill’s generous open-source offerings and the future of the business\n\n\n\nLinks:\n\n\n Windmill\n\n Retool\n\n Tokio\n\n Apache Airflow\n\n Apache Spark\n\n\n\nOther episodes:\n\n\n Prefect with Jeremiah Lowin\n\n Dagster with Nick Schrock\n\n Temporal with Maxim Fateev\n\n Temporal (Part 2) with Maxim Fateev and Dominik Tornow\n\n Apache Cassandra with Patrick McFadin\n\n","content_html":"

Ruben Fiszel (@rubenfiszel) is the creator of Windmill, the open-source developer platform that lets users easily turn scripts into workflows and internal apps with auto-generated UIs. Windmill doesn’t force engineers to change their coding style or adopt a convoluted API, and its low-code design makes it accessible to non-technical users. Tune in to find out how Windmill offers speed, performance and flexibility, while avoiding the limitations of rigid tools.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-07-05T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/0943b880-bfe3-41f9-9645-a9bf8ae683f0.mp3","mime_type":"audio/mpeg","size_in_bytes":25818219,"duration_in_seconds":1609}]},{"id":"3e63823d-96b5-4c56-9e44-6d3f8f3dde42","title":"Vector Search for Humans: Marqo with Jesse Clark","url":"https://www.contributor.fyi/marqo","content_text":"Jesse Clark (@jn2clark) is a co-founder of Marqo, the end-to-end, multimodal vector search engine. Vector search has exploded along with the rise of generative AI models, so Marqo’s arrival has had excellent timing. The project has quickly grown to almost 3000 GitHub stars, despite being less than a year old. Jesse and his team weren’t exactly expecting this level of immediate success, but they are well-positioned to continue developing Marqo as a fixture in the worlds of information retrieval and machine learning. \n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n Jesse’s journey from physics research, to Stitch Fix, Amazon, and finally starting Marqo\n\n Industry vs academia in the cutting edge of machine learning\n\n Why “almost any organization in the world would benefit from Marqo”\n\n Talking about machine learning language - tensors, vectors, embeddings\n\n How Jesse deals with the stress of knowing how fast the AI space is innovating\n\n\n\nLinks:\n\n\n Marqo\n\n\n\nPeople mentioned:\n\n\n Katrina Lake (@kmlake)\n\n Eric Colson (@ericcolson)\n\n","content_html":"

Jesse Clark (@jn2clark) is a co-founder of Marqo, the end-to-end, multimodal vector search engine. Vector search has exploded along with the rise of generative AI models, so Marqo’s arrival has had excellent timing. The project has quickly grown to almost 3000 GitHub stars, despite being less than a year old. Jesse and his team weren’t exactly expecting this level of immediate success, but they are well-positioned to continue developing Marqo as a fixture in the worlds of information retrieval and machine learning.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2023-06-28T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/3e63823d-96b5-4c56-9e44-6d3f8f3dde42.mp3","mime_type":"audio/mpeg","size_in_bytes":32992906,"duration_in_seconds":2057}]},{"id":"d9b7bde0-7c3f-43de-913d-1c7a682c7a48","title":"From Orchestration to Building Applications: Conductor with Jeu George","url":"https://www.contributor.fyi/conductor","content_text":"Jeu George (@jeugeorge) is the co-creator of Conductor, the open-source application building platform. Conductor began as a workflow orchestrator and was originally developed at Netflix. Jeu also co-founded Orkes, a company which offers a cloud product based on Conductor. Tune in to find out how Conductor has evolved into an open-source, battle-tested distributed application platform.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n The core tenets of building Conductor - reliability, language and cloud agnosticism\n\n How Conductor enables teams to share and manage their custom modules\n\n The role of Conductor in Netflix’s switch from licensed to original content\n\n Jeu’s journey from Netflix, to Uber, and finally to Orkes\n\n How Orkes is focusing on integrations and AI orchestration moving forward\n\n\n\nLinks:\n\n\n Conductor\n\n Orkes\n\n\n\nPeople mentioned:\n\n\n Viren Baraiya (@virenbaraiya)\n\n Boney Sekh (@boneyorkes)\n\n Dilip Lukose (@diliplukose)\n\n","content_html":"

Jeu George (@jeugeorge) is the co-creator of Conductor, the open-source application building platform. Conductor began as a workflow orchestrator and was originally developed at Netflix. Jeu also co-founded Orkes, a company which offers a cloud product based on Conductor. Tune in to find out how Conductor has evolved into an open-source, battle-tested distributed application platform.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2023-06-14T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/d9b7bde0-7c3f-43de-913d-1c7a682c7a48.mp3","mime_type":"audio/mpeg","size_in_bytes":32522284,"duration_in_seconds":2028}]},{"id":"b1342db1-9b6a-439c-97ae-99528c5ded3b","title":"Opening Up Authentication: SuperTokens with Advait Ruia","url":"https://www.contributor.fyi/supertokens","content_text":"\n\n\nAdvait Ruia (@Advait_Ruia) is the co-founder of SuperTokens, the open-source user authentication and authorization framework. SuperTokens integrates natively into both your front-end client and your backend endpoint. This approach gives developers more control over the user experience and allows for custom workflows. Tune in to find out why SuperTokens aims to be the best of both the build and the buy argument for authentication solutions.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How SuperTokens evolved from a blog post on session management into a full-fledged infrastructure company\n\n Why there is increasing demand for authentication providers\n\n Do founders need to be in the Bay Area?\n\n Advait’s advice for building community and providing support\n\n Areas where SuperTokens could use outside contributions\n\n\n\nLinks:\n\n\n SuperTokens\n\n SuperTokens Product Roadmap\n\n\n\nPeople mentioned:\n\n\n Rishabh Poddar (@rishpoddar)\n\n\n\nOther episodes:\n\n\n Hasura with Tanmai Gopal\n\n","content_html":"

\n

\n\n

Advait Ruia (@Advait_Ruia) is the co-founder of SuperTokens, the open-source user authentication and authorization framework. SuperTokens integrates natively into both your front-end client and your backend endpoint. This approach gives developers more control over the user experience and allows for custom workflows. Tune in to find out why SuperTokens aims to be the best of both the build and the buy argument for authentication solutions.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-05-31T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/b1342db1-9b6a-439c-97ae-99528c5ded3b.mp3","mime_type":"audio/mpeg","size_in_bytes":28470588,"duration_in_seconds":1775}]},{"id":"9f8dc654-1f8a-4127-b47f-11df2d25e8a6","title":"Open-Source Runtime Security: Falco with Loris Degioanni","url":"https://www.contributor.fyi/falco","content_text":"Loris Degioanni (@lorisdegio) joins Eric Anderson (@ericmander) to chat about Falco, the open-source runtime security tool for modern cloud infrastructures. Loris is the founder and CTO of Sysdig, and co-creator of Wireshark, the legendary open-source packet analysis tool. Today, Loris talks about all these projects and more - tune in to learn about some deep history and Loris’ predictions for the future.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How Loris began working with Gerald Combs as a student in Italy\n\n Why Loris’ teams name their products after animals\n\n The new non-profit Wireshark Foundation\n\n Parallel development of cloud technology and containers during Loris’ career\n\n The little things that make open-source projects go viral\n\n\n\nLinks:\n\n\n Falco\n\n Sysdig\n\n Wireshark\n\n\n\nPeople mentioned:\n\n\n Solomon Hykes (@solomonhykes)\n\n\n\n\n\n","content_html":"

Loris Degioanni (@lorisdegio) joins Eric Anderson (@ericmander) to chat about Falco, the open-source runtime security tool for modern cloud infrastructures. Loris is the founder and CTO of Sysdig, and co-creator of Wireshark, the legendary open-source packet analysis tool. Today, Loris talks about all these projects and more - tune in to learn about some deep history and Loris’ predictions for the future.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n


\n\n

","summary":"","date_published":"2023-04-05T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/9f8dc654-1f8a-4127-b47f-11df2d25e8a6.mp3","mime_type":"audio/mpeg","size_in_bytes":32330023,"duration_in_seconds":2016}]},{"id":"5777eb46-df91-420f-9a17-e96a0703fde4","title":"Decoupling Authorization: Cerbos with Emre Baran","url":"https://www.contributor.fyi/cerbos","content_text":"Emre Baran (@emre) is the CEO and co-founder of Cerbos, the open-source authorization layer for implementing roles and permissions. Cerbos allows developers to decouple authorization logic from core code into its own centrally distributed component. Easier said than done, perhaps - but Cerbos is secure, intentionally simple to implement, and developer-focused.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n The difference between authentication and authorization\n\n Why Cerbos is language-agnostic\n\n Authorization patterns in a single application versus a larger network\n\n The reason most devs start out trying to do authorization themselves, and sometimes give up\n\n How the upcoming Cerbos Cloud will empower less technical users to deploy and manage policies and logs\n\n\n\nLinks:\n\n\n Cerbos\n\n Cerbos Cloud Beta\n\n Zanzibar: Google’s Consistent, Global Authorization System\n\n\n\nPeople mentioned:\n\n\n Charith Ellawala (Github: @charithe)\n\n\n\nOther episodes:\n\n\n Open Policy Agent with Torin Sandall\n\n","content_html":"

Emre Baran (@emre) is the CEO and co-founder of Cerbos, the open-source authorization layer for implementing roles and permissions. Cerbos allows developers to decouple authorization logic from core code into its own centrally distributed component. Easier said than done, perhaps - but Cerbos is secure, intentionally simple to implement, and developer-focused.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-03-22T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/5777eb46-df91-420f-9a17-e96a0703fde4.mp3","mime_type":"audio/mpeg","size_in_bytes":27945631,"duration_in_seconds":1742}]},{"id":"4c537bc9-e202-423b-9b2b-8606398ba8b1","title":"Cosmonic and WebAssembly with Liam Randall and Bailey Hayes","url":"https://www.contributor.fyi/cosmonic","content_text":"Eric Anderson (@ericmander) has a conversation with Liam Randall (@Hectaman) and Bailey Hayes (@baihay) of Cosmonic, the platform-as-a-service environment for building cloud-native applications using WebAssembly. Bailey is also on the steering committee for the Bytecode Alliance, which stewards WebAssembly. In 2021, Cosmonic donated their WebAssembly runtime, wasmCloud, to the CNCF as an open-source project. Today, Liam and Bailey trace the history of WebAssembly, and their personal paths alongside it.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How WebAssembly came together over the last decade to become the fourth standardized language of the web\n The moments when Bailey and Liam both realized they might be changing the future of computing\n Modding Microsoft Flight Simulator with Wasm modules\n Liam’s thoughts on how WebAssembly will affect business models going forward\n\n\nLinks:\n\n\n Cosmonic\n WebAssembly\n Bytecode Alliance\n CNCF wasmCloud\n Wasmtime\n WAMR\n Better together: A Kubernetes and Wasm case study\n Spin\n\n\nPeople mentioned:\n\n\n Kevin Hoffman (@KevinHoffman)\n Kelsey Hightower (@kelseyhightower)\n Guy Bedford (@guybedford)\n Peter Huene (@peterhuene)\n Chris Aniszczyk (@cra)\n\n\nOther episodes:\n\n\n Envoy Proxy with Matt Klein\n Suborbital with Connor Hicks\n","content_html":"

Eric Anderson (@ericmander) has a conversation with Liam Randall (@Hectaman) and Bailey Hayes (@baihay) of Cosmonic, the platform-as-a-service environment for building cloud-native applications using WebAssembly. Bailey is also on the steering committee for the Bytecode Alliance, which stewards WebAssembly. In 2021, Cosmonic donated their WebAssembly runtime, wasmCloud, to the CNCF as an open-source project. Today, Liam and Bailey trace the history of WebAssembly, and their personal paths alongside it.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-03-08T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/4c537bc9-e202-423b-9b2b-8606398ba8b1.mp3","mime_type":"audio/mpeg","size_in_bytes":35858434,"duration_in_seconds":2237}]},{"id":"7dfbe65f-0780-4d48-8743-643fdfd9d8f3","title":"Haystack and Intelligent Search with Milos Rusic","url":"https://www.contributor.fyi/haystack","content_text":"Eric Anderson (@ericmander) is joined by Milos Rusic (@rusic_milos) to discuss Haystack, the open-source NLP framework for leveraging Transformer models and building intelligent search systems. Milos and his colleagues at deepset were early contributors to Hugging Face’s Transformer models, and began building pipelines for searching large document stores. Today, Haystack is wildly popular, with an active Discord community and over 6,000 GitHub stars.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n A deep dive into how Haystack works and its many use cases\n How a customer demo with one-minute long queries helped inspire Haystack\n Marketing open-source projects vs word of mouth\n NLP applications working with structured data and translating between types of data\n Imagining a world where every person has their own personal ChatGPT\n\n\nLinks:\n\n\n Haystack\n deepset\n Hugging Face\n Notion\n\n\nOther episodes:\n\n\n Milvus with Frank Liu\n","content_html":"

Eric Anderson (@ericmander) is joined by Milos Rusic (@rusic_milos) to discuss Haystack, the open-source NLP framework for leveraging Transformer models and building intelligent search systems. Milos and his colleagues at deepset were early contributors to Hugging Face’s Transformer models, and began building pipelines for searching large document stores. Today, Haystack is wildly popular, with an active Discord community and over 6,000 GitHub stars.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2023-02-22T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/7dfbe65f-0780-4d48-8743-643fdfd9d8f3.mp3","mime_type":"audio/mpeg","size_in_bytes":29382575,"duration_in_seconds":1832}]},{"id":"4e5713eb-8813-46b4-a909-1ac543de2ea9","title":"Cube and the Semantic Layer with Artyom Keydunov","url":"https://www.contributor.fyi/cube","content_text":"Eric Anderson (@ericmander) talks with Artyom Keydunov (@keydunov) about Cube, the semantic layer for building data applications. Cube helps engineers bridge data warehouses and data experiences, and provides access control, security, caching, and more helpful features. The project began in open-source and has evolved quite a lot over the last few years with a ton of community support.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n What is a semantic layer?\n Coming up with the idea to open-source during a game of ping pong\n Setting a ten-company-deployment goal\n Using Cube to track COVID stats in lockdown\n How one contributor built a GraphQL API\n\n\nLinks:\n\n\n Cube\n Superset\n Metabase\n Observable\n Streamlit\n\n\nPeople mentioned:\n\n\n Pavel Tiunov (@paveltiunov87)\n","content_html":"

Eric Anderson (@ericmander) talks with Artyom Keydunov (@keydunov) about Cube, the semantic layer for building data applications. Cube helps engineers bridge data warehouses and data experiences, and provides access control, security, caching, and more helpful features. The project began in open-source and has evolved quite a lot over the last few years with a ton of community support.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2023-02-08T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/4e5713eb-8813-46b4-a909-1ac543de2ea9.mp3","mime_type":"audio/mpeg","size_in_bytes":22782581,"duration_in_seconds":1419}]},{"id":"88ab5838-04ad-4b84-98a8-e39a3d969479","title":"Remembering Jeff Meyerson with Erika Hokanson","url":"https://www.contributor.fyi/jeffmeyerson","content_text":"Eric Anderson (@ericmander) and Erika Hokanson (@erikawh0) remember the life of Jeff Meyerson, creator of the influential podcast Software Engineering Daily. He passed during the summer of 2022. Still, his work lives on - thousands of episodes, talks, music, a book, and a community of dedicated listeners and engineers whose lives were touched by Jeff’s dreams.\n\nSoftware Engineering Daily is still running, and you can listen to new episodes right here or wherever you get your podcasts.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nLinks:\n\n\n Software Engineering Daily\n Software Engineering Radio\n The Prion (Soundcloud) (Spotify)\n You Are Not A Commodity\n Move Fast: How Facebook Builds Software\n\n\nPeople mentioned:\n\n\n Pranay Mohan (@pranaymohan)\n","content_html":"

Eric Anderson (@ericmander) and Erika Hokanson (@erikawh0) remember the life of Jeff Meyerson, creator of the influential podcast Software Engineering Daily. He passed during the summer of 2022. Still, his work lives on - thousands of episodes, talks, music, a book, and a community of dedicated listeners and engineers whose lives were touched by Jeff’s dreams.

\n\n

Software Engineering Daily is still running, and you can listen to new episodes right here or wherever you get your podcasts.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2023-01-25T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/88ab5838-04ad-4b84-98a8-e39a3d969479.mp3","mime_type":"audio/mpeg","size_in_bytes":18018682,"duration_in_seconds":1122}]},{"id":"1c424733-1c4d-44cb-99a9-a65cc7353851","title":"Testcontainers and Confidence with Sergei Egorov and Eli Aleyner","url":"https://www.contributor.fyi/testcontainers","content_text":"We’re kicking off the new year with a conversation between Eric Anderson (@ericmander), Sergei Egorov (@bsideup) and Eli Aleyner (@ealeyner). Sergei and Eli founded AtomicJar to maintain Testcontainers, the family of open-source libraries that allow developers to write and run integration tests locally, and treat them as unit tests. Testcontainers is wildly popular, with over six thousand GitHub stars (and climbing!). Tune in to find out how Sergei and Eli are helping people test their software quicker, easier, and more efficiently.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How Testcontainers solves the problem of confidence\n The value of Github’s networking effect\n Inspiration from Amazon’s S3 “test bunny”\n Consequences of Docker’s over- and under-adoption\n Replicating success in other languages besides Java\n\n\nLinks:\n\n\n Testcontainers\n AtomicJar\n Spring\n Quarkus\n Micronaut\n How We Maintain Security Testing within the Software Development Life Cycle\n\n\nPeople mentioned:\n\n\n Richard North (@whichrich)\n Kevin Wittek (@Kiview)\n Martin Fowler (@martinfowler)\n","content_html":"

We’re kicking off the new year with a conversation between Eric Anderson (@ericmander), Sergei Egorov (@bsideup) and Eli Aleyner (@ealeyner). Sergei and Eli founded AtomicJar to maintain Testcontainers, the family of open-source libraries that allow developers to write and run integration tests locally, and treat them as unit tests. Testcontainers is wildly popular, with over six thousand GitHub stars (and climbing!). Tune in to find out how Sergei and Eli are helping people test their software quicker, easier, and more efficiently.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2023-01-11T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/1c424733-1c4d-44cb-99a9-a65cc7353851.mp3","mime_type":"audio/mpeg","size_in_bytes":39006503,"duration_in_seconds":2433}]},{"id":"e33b14e8-80a2-4f74-8707-1439b9361945","title":"Mito and Smarter Spreadsheets with Nate Rush and Aaron Diamond-Reivich","url":"https://www.contributor.fyi/mito","content_text":"Eric Anderson (@ericmander) is joined by Nate Rush (@naterush1997) and Aaron Diamond-Reivich (@_aaronDR) to talk about Mito, the open-source spreadsheet that generates Python code for data analysts. Mito is a Python library and acts as an extension to a Jupyter Notebook. Tune in to find out how the Mito team is bridging the gap in data science between spreadsheets and programming.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How Nate, Aaron and Aaron’s fraternal twin brother Jake have been friends since middle school\n Programming tools for spreadsheet users vs spreadsheet tools for people who are trying to become programmers\n Advantages to integrating into other open-source projects\n Reflecting on the hype around Python data science\n Python needs for Mito’s enterprise customers\n\n\nLinks:\n\n\n Mito\n Project Jupyter\n pandas\n Superhuman\n Streamlit\n\n\nPeople mentioned:\n\n\n Jacob Diamond-Reivich (@Jake_Stack808)\n","content_html":"

Eric Anderson (@ericmander) is joined by Nate Rush (@naterush1997) and Aaron Diamond-Reivich (@_aaronDR) to talk about Mito, the open-source spreadsheet that generates Python code for data analysts. Mito is a Python library and acts as an extension to a Jupyter Notebook. Tune in to find out how the Mito team is bridging the gap in data science between spreadsheets and programming.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2022-11-09T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/e33b14e8-80a2-4f74-8707-1439b9361945.mp3","mime_type":"audio/mpeg","size_in_bytes":29202016,"duration_in_seconds":1821}]},{"id":"d2ab6453-87d5-4b4c-95df-877e7ca072a0","title":"Featureform and the Future of MLOps with Simba Khadder","url":"https://www.contributor.fyi/featureform","content_text":"Eric Anderson (@ericmander) and Simba Khadder (@simba_khadder) explore Featureform, the “virtual” feature store platform that aims to standardize data pipelines for machine learning. Contributor is no stranger to feature stores, but Simba has a broader definition than most. Join us to learn how Featureform enables data scientists and machine learning practitioners to solve a common, but rarely addressed organizational problem.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n How there is no standard or north star for MLOps\n Why enterprise is where Featureform’s value shines\n MLPlatform problems vs MLOps problems\n Why copy/paste and Git don’t cut it\n Deploying MLOps solutions that make data scientists and everyone else happy\n\n\nLinks:\n\n\n Featureform\n Terraform\n Apache Spark\n Feathr\n\n\nOther episodes:\n\n\n Tensorflow with Rajat Monga\n","content_html":"

Eric Anderson (@ericmander) and Simba Khadder (@simba_khadder) explore Featureform, the “virtual” feature store platform that aims to standardize data pipelines for machine learning. Contributor is no stranger to feature stores, but Simba has a broader definition than most. Join us to learn how Featureform enables data scientists and machine learning practitioners to solve a common, but rarely addressed organizational problem.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-10-19T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/d2ab6453-87d5-4b4c-95df-877e7ca072a0.mp3","mime_type":"audio/mpeg","size_in_bytes":31022646,"duration_in_seconds":1934}]},{"id":"062d2916-70a4-4605-9696-6a4e89749948","title":"Directus with Ben Haynes","url":"https://www.contributor.fyi/directus","content_text":"Eric Anderson (@ericmander) hosts Ben Haynes (@benhaynes), CEO and co-founder of Directus. Directus is an open-source data platform that layers on SQL databases to provide an instant API, and includes a no-code data studio interface. Listen in to find out how Directus is aiming to democratize the modern data stack for everyone.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n The inspiration to create an “admin interface on steroids”\n Reflecting on Directus’ unusual linear growth trend\n How Directus powers digital experiences, applications, and internal dev tools\n Ben’s thoughts on maintaining a sustainable, premium open-source experience\n Automated data processing with Directus Flows\n\n\nLinks:\n\n\n Directus\n Supabase\n\n\nOther episodes:\n\n\n Chef with Adam Jacob\n","content_html":"

Eric Anderson (@ericmander) hosts Ben Haynes (@benhaynes), CEO and co-founder of Directus. Directus is an open-source data platform that layers on SQL databases to provide an instant API, and includes a no-code data studio interface. Listen in to find out how Directus is aiming to democratize the modern data stack for everyone.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-09-14T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/062d2916-70a4-4605-9696-6a4e89749948.mp3","mime_type":"audio/mpeg","size_in_bytes":30813249,"duration_in_seconds":1921}]},{"id":"5fa9c2d9-bfb7-4aac-ae12-2eed4feb81ee","title":"Prowler with Toni de la Fuente","url":"https://www.contributor.fyi/prowler","content_text":"Eric Anderson (@ericmander) chats with Toni de la Fuente (@ToniBlyx) about how he created Prowler, an open source security tool for AWS. Toni talks about taking Prowler from a nights-and-weekends project to his current full-time job, managing a team of four. They discuss transitioning from primarily coding to primarily managing tickets and users, as well as being “client zero” and bringing the project to big companies.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n The roadmap from open source Prowler to Prowler Pro\n Prowler’s diverse set of users\n What Toni learned from quitting an earlier open source project\n The differences between Prowler and other security services for AWS\n\n\nLinks:\n\n\n Prowler on Github\n Prowler Pro\n Verica\n Black Hat\n\n\nPeople mentioned:\n\n\n Aaron Rinehart\n Casey Rosenthal\n","content_html":"

Eric Anderson (@ericmander) chats with Toni de la Fuente (@ToniBlyx) about how he created Prowler, an open source security tool for AWS. Toni talks about taking Prowler from a nights-and-weekends project to his current full-time job, managing a team of four. They discuss transitioning from primarily coding to primarily managing tickets and users, as well as being “client zero” and bringing the project to big companies.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2022-08-31T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/5fa9c2d9-bfb7-4aac-ae12-2eed4feb81ee.mp3","mime_type":"audio/mpeg","size_in_bytes":30425383,"duration_in_seconds":1897}]},{"id":"714fa873-4b03-43c2-a6d9-ab06f85e59fe","title":"tea with Max Howell","url":"https://www.contributor.fyi/tea","content_text":"Eric Anderson (@ericmander) meets legendary open-source developer Max Howell (@mxcl) to talk about tea, a decentralized protocol for remunerating the open-source ecosystem. Max is the creator of Homebrew, and he chats about his exit from the project. The conversation turns to his newest project, tea, which is an evolution of Brew, and takes inspiration from blockchain technology. They also discuss Max’s famous interview at Google and his time working for Apple.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n Max’s experience creating Homebrew, one of the largest open-source projects ever\n The utility of Web3 beyond decentralized finance\n Writing a white paper for tea, “just like everyone else”\n Why Max wants a global team, with people in every time zone\n How tea ensures a sustainable future for open-source\n\n\nLinks:\n\n\n Homebrew\n tea.xyz\n tea white paper\n Bitcoin white paper\n Max’s Google interview tweet\n Log4j vulnerability\n “Nebraska” XKCD comic\n Nix OS\n\n\nPeople mentioned:\n\n\n Timothy Lewis\n","content_html":"

Eric Anderson (@ericmander) meets legendary open-source developer Max Howell (@mxcl) to talk about tea, a decentralized protocol for remunerating the open-source ecosystem. Max is the creator of Homebrew, and he chats about his exit from the project. The conversation turns to his newest project, tea, which is an evolution of Brew, and takes inspiration from blockchain technology. They also discuss Max’s famous interview at Google and his time working for Apple.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2022-08-17T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/714fa873-4b03-43c2-a6d9-ab06f85e59fe.mp3","mime_type":"audio/mpeg","size_in_bytes":36542633,"duration_in_seconds":2279}]},{"id":"730b4451-9154-4416-8cca-70455d0302da","title":"Suborbital with Connor Hicks","url":"https://www.contributor.fyi/suborbital","content_text":"Eric Anderson (@ericmander) and Connor Hicks (@cohix) launch into detail on Suborbital, an open-source project that allows developers to create WebAssembly projects embedded in other applications. Connor conceived of Suborbital while frustrated with the cold start problem that can impact Function-as-a-Service platforms. Today, Suborbital collaborates with companies like Microsoft on a community called Wasm Builders, dedicated to sharing and developing innovations in WebAssembly applications.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n The three tentpoles of WebAssembly that make it a useful foundation for Suborbital\n Surprising niche use cases for WebAssembly like IoT and data modeling\n Open-source tools in the Suborbital ecosystem\n Putting focus on building a larger Wasm Builders community\n Connor’s thoughts on how WebAssembly can improve edge computing\n\n\nLinks:\n\n\n Suborbital\n WebAssembly\n Suborbital Compute\n Atmo\n Reactr\n Subo \n Sat\n Firecracker\n","content_html":"

Eric Anderson (@ericmander) and Connor Hicks (@cohix) launch into detail on Suborbital, an open-source project that allows developers to create WebAssembly projects embedded in other applications. Connor conceived of Suborbital while frustrated with the cold start problem that can impact Function-as-a-Service platforms. Today, Suborbital collaborates with companies like Microsoft on a community called Wasm Builders, dedicated to sharing and developing innovations in WebAssembly applications.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2022-08-03T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/730b4451-9154-4416-8cca-70455d0302da.mp3","mime_type":"audio/mpeg","size_in_bytes":37379806,"duration_in_seconds":2332}]},{"id":"78eeb968-daf7-4d1f-a915-028a71ef1801","title":"Milvus with Frank Liu","url":"https://www.contributor.fyi/milvus","content_text":"Eric Anderson (@ericmander) and Frank Liu (@frankzliu) talk about Milvus, the open-source vector database built for scalable similarity search. Vector databases are built to search, index and store embeddings, a requirement for powerful AI applications. Frank is Director of Operations at Zilliz, the company that stewards the project. Tune in to find out how Milvus is the database for the AI era.\n\nSubscribe to Contributor on Substack for email notifications, and join our Slack community!\n\nIn this episode we discuss:\n\n\n A crash course on embeddings and vector databases\n Using Milvus for logo search, crypto predictions, drug discovery, and more\n Other open-source projects at Zilliz that complement Milvus\n “Embedding Everything”\n How Milvus incorporates tunable consistency to its search process\n\n\nLinks:\n\n\n Milvus\n Zilliz\n Towhee\n Attu\n Feder\n\n\nOther episodes:\n\n\n Clickhouse with Alexey Milovidov and Ivan Blinkov\n\n\nCorrection:\n\n\n Milvus is based on a “shared storage” architecture, not “shared nothing.”\n","content_html":"

Eric Anderson (@ericmander) and Frank Liu (@frankzliu) talk about Milvus, the open-source vector database built for scalable similarity search. Vector databases are built to search, index and store embeddings, a requirement for powerful AI applications. Frank is Director of Operations at Zilliz, the company that stewards the project. Tune in to find out how Milvus is the database for the AI era.

\n\n

Subscribe to Contributor on Substack for email notifications, and join our Slack community!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n\n\n

Correction:

\n\n","summary":"","date_published":"2022-07-20T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/78eeb968-daf7-4d1f-a915-028a71ef1801.mp3","mime_type":"audio/mpeg","size_in_bytes":26713905,"duration_in_seconds":1665}]},{"id":"8b29f142-9c92-4a8c-a890-781193a30e03","title":"Apache Beam with Kenn Knowles and Pablo Estrada","url":"https://www.contributor.fyi/beam","content_text":"Eric Anderson (@ericmander) reunites with old colleagues Kenn Knowles (@KennKnowles) and Pablo Estrada (@polecitoem) for a conversation on Apache Beam, the open-source programming model for data processing. The trio once worked together at Google, and Beam was a turning point in the history of open-source there. Today, both Kenn and Pablo are members of the Beam PMC, and join the show with the inside scoop on Beam’s past, present and future.\n\nIn this episode we discuss:\n\n\n Transitioning Beam to the Apache Way\n How “inner source” works at Google\n Thoughts on the relationship between batch processing and streaming\n Some ways that community “power users” have contributed to Beam\n Information on Beam Summit 2022, the first onsite summit since COVID began\n \n The first few people to register can use code BEAM_POD_INV for a discount on tickets!\n \n \n\n\nLinks:\n\n\n Apache Beam\n Apache Spark\n Apache Flink\n Apache Nemo\n Apache Samza\n Apache Crunch\n MapReduce paper \n MillWheel paper\n FlumeJava paper\n Dataflow paper\n Beam Summit 2022 Website\n\n\nOther episodes:\n\n\nTensorFlow with Rajat Monga\n","content_html":"

Eric Anderson (@ericmander) reunites with old colleagues Kenn Knowles (@KennKnowles) and Pablo Estrada (@polecitoem) for a conversation on Apache Beam, the open-source programming model for data processing. The trio once worked together at Google, and Beam was a turning point in the history of open-source there. Today, both Kenn and Pablo are members of the Beam PMC, and join the show with the inside scoop on Beam’s past, present and future.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-07-06T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/8b29f142-9c92-4a8c-a890-781193a30e03.mp3","mime_type":"audio/mpeg","size_in_bytes":36841056,"duration_in_seconds":2298}]},{"id":"0dcf389a-e973-4e79-aeab-a49af3e5695a","title":"Temporal (Part 2) with Maxim Fateev and Dominik Tornow","url":"https://www.contributor.fyi/temporal-2","content_text":"Eric Anderson (@ericmander) returns to Temporal with co-founder Maxim Fateev (@mfateev) and principal engineer Dominik Tornow (@DominikTornow). When Maxim joined us in September of 2020, the company called their project a “workflow orchestrator.” Today, Temporal has grown in popularity and usability, but the terminology around that abstraction has changed. Tune in to track the evolution of what Maxim calls a genuinely “new category of software.”\n\nIn this episode we discuss:\n\n\n New features and developments in the last 2 years\n The proper way to pronounce “Temporal”\n How Temporal guarantees that workflow execution actually runs to execution\n Describing Temporal as a new pair of glasses\n Replay, Temporal’s first developer conference on August 25-26, in Seattle\n\n\nLinks:\n\n\n Temporal\n Cadence\n Apache Cassandra\n Replay\n\n\nPeople mentioned:\n\n\n Samar Abbas (@samarabbas77)\n\n\nOther episodes:\n\n\n Temporal with Maxim Fateev\n Apache Cassandra with Patrick McFadin\n","content_html":"

Eric Anderson (@ericmander) returns to Temporal with co-founder Maxim Fateev (@mfateev) and principal engineer Dominik Tornow (@DominikTornow). When Maxim joined us in September of 2020, the company called their project a “workflow orchestrator.” Today, Temporal has grown in popularity and usability, but the terminology around that abstraction has changed. Tune in to track the evolution of what Maxim calls a genuinely “new category of software.”

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-06-22T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/0dcf389a-e973-4e79-aeab-a49af3e5695a.mp3","mime_type":"audio/mpeg","size_in_bytes":39174940,"duration_in_seconds":2444}]},{"id":"c6f2ba00-e572-4c41-843e-090954e7c189","title":"Scarf with Avi Press","url":"https://www.contributor.fyi/scarf","content_text":"Eric Anderson (@ericmander) interviews Avi Press (@avi_press) about Scarf, the distribution platform for open-source software that facilitates analytics and commercialization. Scarf offers a set of tools that allows founders and maintainers to understand adoption of their products, including Scarf Gateway, which provides a central access point to containers and packages. From there, open-source developers can connect with the people that rely on their work.\n\nIn this episode we discuss:\n\n\n Why you can’t rely on Github as a source of comprehensive data about open-source software\n Tracing a user’s journey interacting with a project across multiple platforms\n How better observability allows maintainers to make better software\n Inspiring indie maintainers to commercialize their projects\n The privilege of being able to work in open-source, and how Scarf can enable a more inclusive developer community\n\n\nLinks:\n\n\n Scarf\n Tidelift\n Gitcoin\n OpenTeams\n Aviyel\n","content_html":"

Eric Anderson (@ericmander) interviews Avi Press (@avi_press) about Scarf, the distribution platform for open-source software that facilitates analytics and commercialization. Scarf offers a set of tools that allows founders and maintainers to understand adoption of their products, including Scarf Gateway, which provides a central access point to containers and packages. From there, open-source developers can connect with the people that rely on their work.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2022-06-08T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/c6f2ba00-e572-4c41-843e-090954e7c189.mp3","mime_type":"audio/mpeg","size_in_bytes":27281912,"duration_in_seconds":1701}]},{"id":"9be95d7e-f32b-4ba7-a53d-32453e13926c","title":"Rasgo with Patrick Dougherty","url":"https://www.contributor.fyi/rasgo","content_text":"Eric Anderson (@ericmander) and Patrick Dougherty (@cpdough) talk about Rasgo, the data transformation platform for MLOps that makes generating SQL easy. The team at Rasgo recently open-sourced a package called RasgoQL, that allows users to execute SQL queries against a data warehouse using Python syntax. Tune in to find out how Rasgo aims to bridge an important gap in the Modern Data Stack.\n\nIn this episode we discuss:\n\n\n The advantages of offering both a low-code/no-code UI and a Python interface\n \"How can a data scientist, without needing full-time resources from data engineering, be somewhat self-sufficient in data prep and able to deliver those insights without a massive human capital investment needed?\"\n Where Rasgo fits into the world of feature stores\n Why one Rasgo user took a trip to a wind farm in Texas\n Eric’s predictions for the future of data prep and transformation\n\n\nLinks:\n\n\n Rasgo\n RasgoQL\n DuckDB\n Delta Lake\n\n\nPeople mentioned:\n\n\n Jared Parker (@jaredtparker_)\n","content_html":"

Eric Anderson (@ericmander) and Patrick Dougherty (@cpdough) talk about Rasgo, the data transformation platform for MLOps that makes generating SQL easy. The team at Rasgo recently open-sourced a package called RasgoQL, that allows users to execute SQL queries against a data warehouse using Python syntax. Tune in to find out how Rasgo aims to bridge an important gap in the Modern Data Stack.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2022-05-25T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/9be95d7e-f32b-4ba7-a53d-32453e13926c.mp3","mime_type":"audio/mpeg","size_in_bytes":27736651,"duration_in_seconds":1729}]},{"id":"1a885b52-6545-43f6-8764-6b570f8e0b1e","title":"Feast with Willem Pienaar","url":"https://www.contributor.fyi/feast","content_text":"Eric Anderson (@ericmander) and Willem Pienaar (@willpienaar) talk about Feast, the open-source feature store for machine learning. Feature stores act as a bridge between models and data, and allow data scientists to ship features into production without the need for engineers. Willem co-created Feast at Gojek, and later teamed up with the folks at Tecton to back the project.\n\nIn this episode we discuss:\n\n\n The value of feature stores in MLOps\n What happens when you open-source too early\n Why most open-source code has nothing to hide\n Bringing an open-source project to an existing company\n Good and bad use cases for a feature store\n\n\nLinks:\n\n\n Feast\n Tecton\n Turing\n Merlin\n Kubeflow\n apply() Conference\n\n\nPeople mentioned:\n\n\n Mike Del Balso\n Kevin Stumpf (@kevinmstumpf)\n Ajey Gore (@AjeyGore)\n Demetrios Brinkmann (@Dpbrinkm)\n Wes McKinney (@wesmckinn)\n\n\nOther episodes:\n\n\n Flyte with Ketan Umare\n Great Expectations with Abe Gong and Kyle Eaton\n","content_html":"

Eric Anderson (@ericmander) and Willem Pienaar (@willpienaar) talk about Feast, the open-source feature store for machine learning. Feature stores act as a bridge between models and data, and allow data scientists to ship features into production without the need for engineers. Willem co-created Feast at Gojek, and later teamed up with the folks at Tecton to back the project.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-05-11T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/1a885b52-6545-43f6-8764-6b570f8e0b1e.mp3","mime_type":"audio/mpeg","size_in_bytes":31442869,"duration_in_seconds":1948}]},{"id":"0d03676e-32a8-488d-bfa3-1b6828e2743a","title":"Flyte with Ketan Umare","url":"https://www.contributor.fyi/flyte","content_text":"Eric Anderson (@ericmander) and Ketan Umare (@ketanumare) discuss Flyte, the open-source workflow automation platform for large-scale machine learning and data use cases. Ketan is a former engineer at Lyft, where he created Flyte to help models in Pricing, Locations, ETA, and more. Today, the project allows machine learning developers everywhere to bring their ideas from conception to production.\n\nIn this episode we discuss:\n\n\n How Flyte combines compute with parts of a workflow engine in a way that is best for the user\n The importance of reliable fares and ETA predictions at a ride-sharing app\n A progenitor to Flyte called “Better Airflow”\n Ketan’s innovative approach to bringing typing to machine learning workloads\n Why Flyte landed at the Linux Foundation\n\n\nLinks:\n\n\n Flyte\n Union.ai\n Apache Airflow\n Kubeflow\n Luigi\n MLTwist\n\n\nOther episodes:\n\n\n Great Expectations with Abe Gong and Kyle Eaton\n Envoy Proxy with Matt Klein\n","content_html":"

Eric Anderson (@ericmander) and Ketan Umare (@ketanumare) discuss Flyte, the open-source workflow automation platform for large-scale machine learning and data use cases. Ketan is a former engineer at Lyft, where he created Flyte to help models in Pricing, Locations, ETA, and more. Today, the project allows machine learning developers everywhere to bring their ideas from conception to production.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-04-27T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/0d03676e-32a8-488d-bfa3-1b6828e2743a.mp3","mime_type":"audio/mpeg","size_in_bytes":35087299,"duration_in_seconds":2188}]},{"id":"0c52ecff-f6a6-4ef3-8e48-17667845d35f","title":"Activeloop with Davit Buniatyan","url":"https://www.contributor.fyi/activeloop","content_text":"Eric Anderson (@ericmander) meets with Davit Buniatyan (@DBuniatyan) of Activeloop, the database for AI. Davit was inspired to found Activeloop while working on large datasets in a neuroscience research lab at Princeton. Powering the technology at Activeloop is Hub, the open-source dataset format for AI applications. Join us to learn how Hub promises to enhance and expand various verticals in deep learning.\n\nIn this episode we discuss:\n\n\n Reconfiguring traditional ML tooling for the cloud\n Connectomics - working with thin slices of a mouse brain with neuroscientist Sebastian Seung\n Choosing between university, a start-up, and open-source\n Davit’s original product, that ran computation on crypto mining GPUs on a distributed scale\n Focusing on different data modalities for computer vision\n\n\nLinks:\n\n\n Activeloop\n Activeloop Hub\n Apache Parquet\n Apache Spark\n TensorFlow\n Snowflake\n Databricks\n Timescale\n\n\nPeople mentioned:\n\n\n Sebastian Seung (@SebastianSeung)\n\n\nOther episodes:\n\n\n TensorFlow with Rajat Monga\n","content_html":"

Eric Anderson (@ericmander) meets with Davit Buniatyan (@DBuniatyan) of Activeloop, the database for AI. Davit was inspired to found Activeloop while working on large datasets in a neuroscience research lab at Princeton. Powering the technology at Activeloop is Hub, the open-source dataset format for AI applications. Join us to learn how Hub promises to enhance and expand various verticals in deep learning.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-04-13T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/0c52ecff-f6a6-4ef3-8e48-17667845d35f.mp3","mime_type":"audio/mpeg","size_in_bytes":29348720,"duration_in_seconds":1830}]},{"id":"3e7677bf-14ab-4060-90b3-8f3eb5e7b93c","title":"Unikraft with Alexander Jung and Simon Kuenzer","url":"https://www.contributor.fyi/unikraft","content_text":"Eric Anderson (@ericmander), Alexander Jung (@nderjung) and Simon Kuenzer (Github: @skuenzer) get technical on Unikraft, the open-source unikernel development kit. Unikernels are specialized, high performing OS images that have the potential to revolutionize virtualization. Unikraft makes unikernels easy to use by prioritizing modularity, security, and POSIX-compatibility.\n\nIn this episode we discuss:\n\n\n How Unikraft seeks wider adoption of unikernels in real-world applications\n Unikraft’s background in research and academia\n Bottom-up as well as top-down specialization\n Building a community with a large proportion of students\n\n\nLinks:\n\n\n Unikraft\n Unikraft: Fast, Specialized Unikernels the Easy Way\n Xen Project\n MirageOS\n HermitCore\n Firecracker\n","content_html":"

Eric Anderson (@ericmander), Alexander Jung (@nderjung) and Simon Kuenzer (Github: @skuenzer) get technical on Unikraft, the open-source unikernel development kit. Unikernels are specialized, high performing OS images that have the potential to revolutionize virtualization. Unikraft makes unikernels easy to use by prioritizing modularity, security, and POSIX-compatibility.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2022-03-30T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/3e7677bf-14ab-4060-90b3-8f3eb5e7b93c.mp3","mime_type":"audio/mpeg","size_in_bytes":26195218,"duration_in_seconds":1633}]},{"id":"be889359-683f-4969-8389-49ae2a46a5bb","title":"EdgeDB with Yury Selivanov","url":"https://www.contributor.fyi/edgedb","content_text":"Eric Anderson (@ericmander) has a conversation with Yury Selivanov (@1st1), the co-founder of EdgeDB. EdgeDB is the world’s first “graph-relational database.” It’s a term coined specifically for this new type of database, designed to ease the pain of dealing with the usual relational and NoSQL models. And no, EdgeDB is NOT a graph database!\n\nIn this episode we discuss:\n\n\n A glitch at EdgeDB’s Matrix-inspired launch event\n Origin of the term and design philosophy, “graph-relational”\n What to know about becoming a Python core developer\n How EdgeDB’s next-gen query language compares to GraphQL and SQL\n\n\nLinks:\n\n\n EdgeDB\n magicstack\n uvloop\n\n\nPeople mentioned:\n\n\n Elvis Pranskevichus (@elprans)\n Colin McDonnell (@colinhacks)\n Victor Petrovykh (Github: @vpetrovykh)\n Dan Abramov (@dan_abramov)\n Brett Cannon (@brettsky)\n Daniel Levine (@daniel_levine)\n\n\nOther episodes:\n\n\n Hasura with Tanmai Gopal\n Dgraph with Manish Jain\n","content_html":"

Eric Anderson (@ericmander) has a conversation with Yury Selivanov (@1st1), the co-founder of EdgeDB. EdgeDB is the world’s first “graph-relational database.” It’s a term coined specifically for this new type of database, designed to ease the pain of dealing with the usual relational and NoSQL models. And no, EdgeDB is NOT a graph database!

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-03-16T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/be889359-683f-4969-8389-49ae2a46a5bb.mp3","mime_type":"audio/mpeg","size_in_bytes":43698095,"duration_in_seconds":2727}]},{"id":"46d9de15-1bb3-4835-bdc2-7520df7d11c2","title":"Deephaven with Pete Goddard","url":"https://www.contributor.fyi/deephaven","content_text":"Eric Anderson (@ericmander) sits down with Pete Goddard (@pete_paco) to talk about Deephaven, the open-core query engine built for real-time streams and batch data. Pete is the CEO of Deephaven Data Labs, and comes to the data world from a background in capital markets trading. Deephaven originally addressed a need for real-time data infrastructure in the finance world, but the team realized how useful their technology could be in a wider variety of verticals. Join us for Pete’s unique perspective on reaching out into alternate industries and use cases through community development.\n\nIn this episode we discuss:\n\n\n How Pete transitioned from Wall Street to open-source software\n Selling investors on open-source\n Two questions people always ask Pete\n The luxury of Deephaven’s incremental update model\n Barrage, Deephaven’s API for streaming tables that extends Apache Arrow Flight\n\n\nLinks:\n\n\n Deephaven\n Barrage\n Apache Kafka\n Apache Arrow Flight\n Eclipse Jetty\n\n\nOther episodes:\n\nTensorFlow with Rajat Monga","content_html":"

Eric Anderson (@ericmander) sits down with Pete Goddard (@pete_paco) to talk about Deephaven, the open-core query engine built for real-time streams and batch data. Pete is the CEO of Deephaven Data Labs, and comes to the data world from a background in capital markets trading. Deephaven originally addressed a need for real-time data infrastructure in the finance world, but the team realized how useful their technology could be in a wider variety of verticals. Join us for Pete’s unique perspective on reaching out into alternate industries and use cases through community development.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n

TensorFlow with Rajat Monga

","summary":"","date_published":"2022-03-02T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/46d9de15-1bb3-4835-bdc2-7520df7d11c2.mp3","mime_type":"audio/mpeg","size_in_bytes":39709928,"duration_in_seconds":2477}]},{"id":"7c2b6753-a1cd-4ed7-84d3-215e83493d80","title":"Meltano with Douwe Maan","url":"https://www.contributor.fyi/meltano","content_text":"Eric Anderson (@ericmander) and Douwe Maan (@DouweM) chat about Meltano, the open-source DataOps operating system. Meltano provides the connective tissue that allows teams to treat their data stack as a single software development project. Tune in to learn how Meltano is trying to bring software development best practices into the data world.\n\nIn this episode we discuss:\n\n\n Meltano’s origins as a side project at GitLab\n How Meltano glues together open-source technologies like Singer, dbt and Airflow\n Douwe’s experience wearing many different hats in the early days of Meltano\n Meltano’s shift from an ELT solution to an operating system\n The Love-Tap Fest community event, starting right after this episode’s release!\n\n\nLinks:\n\n\n Meltano\n Love-Tap Fest - February 17-24th, 2022\n GitLab\n Singer\n dbt\n Apache Airflow\n Apache Superset\n Terraform\n\n\nPeople mentioned:\n\n\n Taylor Murphy (@tayloramurphy)\n AJ Steers (@aaronsteers)\n\n\nOther episodes:\n\n\n Great Expectations with Abe Gong and Kyle Eaton\n Dagster with Nick Schrock\n Prefect with Jeremiah Lowin\n\n","content_html":"

Eric Anderson (@ericmander) and Douwe Maan (@DouweM) chat about Meltano, the open-source DataOps operating system. Meltano provides the connective tissue that allows teams to treat their data stack as a single software development project. Tune in to learn how Meltano is trying to bring software development best practices into the data world.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-02-16T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/7c2b6753-a1cd-4ed7-84d3-215e83493d80.mp3","mime_type":"audio/mpeg","size_in_bytes":37767254,"duration_in_seconds":2356}]},{"id":"b4799a49-6914-417a-b4ab-7d5acd47def2","title":"Penpot & Taiga with Pablo Ruiz-Múzquiz","url":"https://www.contributor.fyi/penpot-taiga","content_text":"Eric Anderson (@ericmander) and Pablo Ruiz-Múzquiz (@diacritica) examine the intersection of open-source, agile development, and UI/UX design at the heart of two applications, Penpot and Taiga. Penpot is a design and prototyping platform intended for cross-domain teams, while Taiga is a popular agile project management software. These products comprise the heart of Pablo’s innovative company, Kaleidos Open Source, which was founded in Spain more than a decade ago. Listen to today’s episode for one of the industry’s most unique perspectives on open-source code and design.\n\nIn this episode we discuss:\n\n\n An internal crisis and a major pivot for Kaleidos\n How Penpot was born from Kaleidos’ signature personal innovation week\n Designing a design tool that can be used to design itself\n Bringing design, code and people closer together\n Why Pablo asserts that designers care about open-source\n\n\nLinks:\n\n\n Penpot\n Taiga\n Kaleidos Open Source\n\n\nOther episodes:\n\n\n Blender with Dalai Felinto\n","content_html":"

Eric Anderson (@ericmander) and Pablo Ruiz-Múzquiz (@diacritica) examine the intersection of open-source, agile development, and UI/UX design at the heart of two applications, Penpot and Taiga. Penpot is a design and prototyping platform intended for cross-domain teams, while Taiga is a popular agile project management software. These products comprise the heart of Pablo’s innovative company, Kaleidos Open Source, which was founded in Spain more than a decade ago. Listen to today’s episode for one of the industry’s most unique perspectives on open-source code and design.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2022-02-02T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/b4799a49-6914-417a-b4ab-7d5acd47def2.mp3","mime_type":"audio/mpeg","size_in_bytes":37557856,"duration_in_seconds":2343}]},{"id":"2169468c-f049-4f15-b054-286468545a0a","title":"DockerSlim with Kyle Quest","url":"https://www.contributor.fyi/dockerslim","content_text":"Eric Anderson (@ericmander) and Kyle Quest (@kcqon) discuss DockerSlim, the open-source optimization and security tool for Docker container images. Kyle initially created DockerSlim as a humble hackathon project, and now supports it with his company, Slim.AI. Tune in to learn how DockerSlim is redefining DevOps with application intelligence and a backwards compatible vision of the future.\n\nIn this episode we discuss:\n\n\n Bridging the gap between application and infrastructure\n Emerging from the cloud native stone age\n Application intelligence rather than artificial intelligence in Slim.AI\n DockerSlim integrated into CI/CD pipelines, embedded systems, and robots\n How Slim.AI aims to become ‘Google for containers’\n\n\nLinks:\n\n\n DockerSlim\n Slim.AI\n Terraform\n Serverless\n Sigstore\n","content_html":"

Eric Anderson (@ericmander) and Kyle Quest (@kcqon) discuss DockerSlim, the open-source optimization and security tool for Docker container images. Kyle initially created DockerSlim as a humble hackathon project, and now supports it with his company, Slim.AI. Tune in to learn how DockerSlim is redefining DevOps with application intelligence and a backwards compatible vision of the future.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2021-12-22T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/2169468c-f049-4f15-b054-286468545a0a.mp3","mime_type":"audio/mpeg","size_in_bytes":37054633,"duration_in_seconds":2311}]},{"id":"aa7157aa-315e-4c80-8a86-24737f8e4e33","title":"Unleash with Egil and Ivar Østhus ","url":"https://www.contributor.fyi/unleash","content_text":"Eric Anderson (@ericmander) is joined by Egil (@EgilCo) and Ivar Østhus (@ivarconr), brothers and co-creators of the open-source feature management platform, Unleash. It’s a real family business, with Egil acting as CEO and Ivar the CTO of the company. Over beers and burgers, the two decided to bring their strengths together for a feature toggle tool that transforms DevOps and continuous deployment pipelines.\n\nIn this episode we discuss:\n\n\n Ivar as a pioneer in the trunk-based development space\n Word of mouth and old colleagues bringing Unleash to new companies\n Assessing a contributor’s personality and mindset\n Resolving a deadlock scenario on a feature launch day without impacting customers\n Why feature flagging is fundamental to true DevOps\n\n\nLinks:\n\n\n Unleash\n","content_html":"

Eric Anderson (@ericmander) is joined by Egil (@EgilCo) and Ivar Østhus (@ivarconr), brothers and co-creators of the open-source feature management platform, Unleash. It’s a real family business, with Egil acting as CEO and Ivar the CTO of the company. Over beers and burgers, the two decided to bring their strengths together for a feature toggle tool that transforms DevOps and continuous deployment pipelines.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2021-12-08T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/aa7157aa-315e-4c80-8a86-24737f8e4e33.mp3","mime_type":"audio/mpeg","size_in_bytes":28369024,"duration_in_seconds":1768}]},{"id":"ef04df00-ffaa-4637-917f-0315fc6e930a","title":"Kubescape with Shauli Rozen","url":"https://www.contributor.fyi/kubescape","content_text":"Eric Anderson (@ericmander) invites Shauli Rozen (@shaulir) to share about his work on Kubescape, the first open-source Kubernetes security testing tool that is compliant with NSA & CISA hardening guidelines. Despite the project’s recency, Kubescape has seen explosive growth on Github and recognition from the Kubernetes community. Tune in to learn how the team at ARMO built a successful open-source security tool for DevOps.\n\nIn this episode we discuss:\n\n\n Why Kubescape uses guidance from the NSA & CISA\n Correcting the misconception that developers don’t care about security\n Providing value in the first five minutes of using the tool\n ARMO’s detailed approach to community feedback\n Shauli’s thoughts on security roles of the future\n\n\nLinks:\n\n\n ARMO\n Kubescape\n Terraform\n","content_html":"

Eric Anderson (@ericmander) invites Shauli Rozen (@shaulir) to share about his work on Kubescape, the first open-source Kubernetes security testing tool that is compliant with NSA & CISA hardening guidelines. Despite the project’s recency, Kubescape has seen explosive growth on Github and recognition from the Kubernetes community. Tune in to learn how the team at ARMO built a successful open-source security tool for DevOps.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2021-11-24T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/ef04df00-ffaa-4637-917f-0315fc6e930a.mp3","mime_type":"audio/mpeg","size_in_bytes":25810695,"duration_in_seconds":1609}]},{"id":"115a2c83-8b73-4c86-86ac-b106b7ffe831","title":"Blender with Dalai Felinto","url":"https://www.contributor.fyi/blender","content_text":"Eric Anderson (@ericmander) connects with Dalai Felinto (@dfelinto), development coordinator at Blender. Blender is a free and open-source 3D graphics toolset with a unique story spanning nearly 30 years. The project is used professionally for animation, video games, scientific visualization, and much more. Join us for a very special episode of Contributor as we take a deep dive into one of the most dedicated, robust communities in open-source history.\n\nIn this episode we discuss:\n\n\n When the dotcom crash landed Blender in the hands of community members\n Taking open-source beyond the toolset with open movie projects\n Dalai’s transition from burgeoning architect to Blender developer\n How you can use Blender’s new Geometry Nodes for AI training\n Solving organizational challenges with full-time staff and contributors\n\n\nLinks:\n\n\n Blender\n Blender Studio\n Big Buck Bunny\n Elephants Dream\n Sprite Fright\n\n\nPeople mentioned:\n\nTon Roosendaal (@tonroosendaal)","content_html":"

Eric Anderson (@ericmander) connects with Dalai Felinto (@dfelinto), development coordinator at Blender. Blender is a free and open-source 3D graphics toolset with a unique story spanning nearly 30 years. The project is used professionally for animation, video games, scientific visualization, and much more. Join us for a very special episode of Contributor as we take a deep dive into one of the most dedicated, robust communities in open-source history.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n

Ton Roosendaal (@tonroosendaal)

","summary":"","date_published":"2021-11-10T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/115a2c83-8b73-4c86-86ac-b106b7ffe831.mp3","mime_type":"audio/mpeg","size_in_bytes":35124531,"duration_in_seconds":2175}]},{"id":"944ab3e7-c66e-4509-a7e9-c72d15d707ad","title":"Great Expectations with Abe Gong and Kyle Eaton","url":"https://www.contributor.fyi/greatexpectations","content_text":"Eric Anderson (@ericmander) interviews Abe Gong (@AbeGong) and Kyle Eaton (@SuperCoKyle) about Great Expectations, the open-source framework that aims to create a shared standard for data quality. Abe is a core contributor to the project, and the CEO and co-founder of Superconductive, the team backing Great Expectations. Kyle is Growth Lead at Superconductive, and Community Manager of Great Expectations. The team at Superconductive have just launched the new Expectation Gallery to connect contributors and carve out vertical spaces in this ecosystem. Tune in to find out why Great Expectations is the leading open-source project for eliminating pipeline debt.\n\nIn this episode we discuss:\n\n\n How the Expectation Gallery enables new modes of community engagement\n Superconductive’s pivot from healthcare data consulting to open-source data validation\n Collaborative conversations with other data companies\n Abe’s advice to future open-source founders on segmenting value\n The vision of Great Expectations as a protocol-level open standard\n\n\nLinks:\n\n\n Great Expectations\n Superconductive\n Down with Pipeline debt\n Cascade Data Labs\n Flyte\n Dagster\n Databricks\n pandas\n\n\nPeople mentioned:\n\n\n James Campbell (@jpcampbell42)\n\n\nOther episodes:\n\n\n Dagster with Nick Schrock\n","content_html":"

Eric Anderson (@ericmander) interviews Abe Gong (@AbeGong) and Kyle Eaton (@SuperCoKyle) about Great Expectations, the open-source framework that aims to create a shared standard for data quality. Abe is a core contributor to the project, and the CEO and co-founder of Superconductive, the team backing Great Expectations. Kyle is Growth Lead at Superconductive, and Community Manager of Great Expectations. The team at Superconductive have just launched the new Expectation Gallery to connect contributors and carve out vertical spaces in this ecosystem. Tune in to find out why Great Expectations is the leading open-source project for eliminating pipeline debt.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2021-10-27T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/944ab3e7-c66e-4509-a7e9-c72d15d707ad.mp3","mime_type":"audio/mpeg","size_in_bytes":31076563,"duration_in_seconds":1938}]},{"id":"acb0cb1d-f78d-4fb5-8e0a-df3f2180e41b","title":"Bolster with Abhishek Dubey","url":"https://www.contributor.fyi/bolster","content_text":"Eric Anderson (@ericmander) sits down with Abhishek Dubey (@abhishekdubey), co-creator of Bolster, the fraud prevention platform powered by deep-learning. Bolster is used by clients like LinkedIn, Uber and Dropbox for its cutting-edge detection and takedown technology. Abhishek and his co-founder built Bolster around the real-time URL-scanning tool CheckPhish, which analyzes phishing sites for free. On today’s episode, learn how Abhishek and the team at Bolster have found success by focusing on building their business out of passion, and giving back to the community.\n\nIn this episode we discuss:\n\n\n A second mortgage and a startup garage\n Discovering 100 Fortune 500 companies were using CheckPhish\n How Bolster snagged LinkedIn without a proof of concept\n Bolster’s secret sauce, that sets them apart from other security companies\n Comparing the community focus of Bolster to a traditional open-source model\n\n\n\nLinks:\n\n\n Bolster\n CheckPhish\n Hacker Dojo\n Twilio\n\n\n\nPeople mentioned:\n\n\n Shashi Prakash (@skiddzo)\n\n","content_html":"

Eric Anderson (@ericmander) sits down with Abhishek Dubey (@abhishekdubey), co-creator of Bolster, the fraud prevention platform powered by deep-learning. Bolster is used by clients like LinkedIn, Uber and Dropbox for its cutting-edge detection and takedown technology. Abhishek and his co-founder built Bolster around the real-time URL-scanning tool CheckPhish, which analyzes phishing sites for free. On today’s episode, learn how Abhishek and the team at Bolster have found success by focusing on building their business out of passion, and giving back to the community.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-07-28T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/acb0cb1d-f78d-4fb5-8e0a-df3f2180e41b.mp3","mime_type":"audio/mpeg","size_in_bytes":33056854,"duration_in_seconds":2061}]},{"id":"c63d8d6f-d72e-4739-ae8f-beef65aaa9e1","title":"Sanity with Magnus Hillestad and Even Westvang","url":"https://www.contributor.fyi/sanity","content_text":"Eric Anderson (@ericmander) sits down with Magnus Hillestad (@MHillestad) and Even Westvang (@even), co-founders of the unified content platform Sanity. The team at Sanity helps businesses organize their structured content as data, allowing distribution from a single source of truth. Tune in today’s episode to learn how Sanity aims to change the way people think about content.\n\nIn this episode we discuss:\n\n\n The open-source editing environment and CMS, Sanity Studio\n From content as data, to coffee table books\n How Sanity differs from a traditional CMS\n Why the Sanity team turned down a contract with the United Nations\n Building a team that can scale to a vision of ubiquity\n\n\nLinks:\n\n\n Sanity\n Sanity Studio\n Figma\n Brex\n Netlify\n\n\nPeople mentioned:\n\n\n Simen Svale Skogsrud (@svale)\n Øyvind Rostad (@rostad)\n\n","content_html":"

Eric Anderson (@ericmander) sits down with Magnus Hillestad (@MHillestad) and Even Westvang (@even), co-founders of the unified content platform Sanity. The team at Sanity helps businesses organize their structured content as data, allowing distribution from a single source of truth. Tune in today’s episode to learn how Sanity aims to change the way people think about content.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-07-14T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/c63d8d6f-d72e-4739-ae8f-beef65aaa9e1.mp3","mime_type":"audio/mpeg","size_in_bytes":39205451,"duration_in_seconds":2446}]},{"id":"067b7345-220f-4198-b123-b1d90f029908","title":"Teleport with Ev Kontsevoy","url":"https://www.contributor.fyi/teleport","content_text":"Eric Anderson (@ericmander) and Ev Kontsevoy (@kontsevoy) talk about Teleport, the open-source tool for instant access to cloud resources. These include SSH servers, Kubernetes clusters, databases and more. Teleport was inspired by the growing complexity of cloud environments, and aims to make engineers feel like all their cloud applications are in the same room together.\n\nIn this episode we discuss:\n\n\n How Teleport grew from a side project to Gravity, the open-source toolkit for packaging and running applications autonomously\n Unifying and consolidating modern access methods and industry best practices\n Bringing identity to a protocol-level\n An early community use case for Teleport in the cattle industry\n Engaging with outside contributions while balancing security constraints\n\n\n\nLinks:\n\n\n Teleport\n Gravity\n Mailgun\n\n","content_html":"

Eric Anderson (@ericmander) and Ev Kontsevoy (@kontsevoy) talk about Teleport, the open-source tool for instant access to cloud resources. These include SSH servers, Kubernetes clusters, databases and more. Teleport was inspired by the growing complexity of cloud environments, and aims to make engineers feel like all their cloud applications are in the same room together.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2021-06-30T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/067b7345-220f-4198-b123-b1d90f029908.mp3","mime_type":"audio/mpeg","size_in_bytes":32330441,"duration_in_seconds":2016}]},{"id":"2dfe72f0-36f3-4cc8-adb4-1b805101030d","title":"Rook with Travis Nielsen","url":"https://www.contributor.fyi/rook","content_text":"Eric Anderson (@ericmander) and Travis Nielsen (@STravisNielsen) talk about Rook, the open-source storage orchestrator for Kubernetes. Travis is a Senior Principal Software Engineer at Red Hat, and maintainer of Rook. Join us to dive deep into the story of Rook, from Microsoft, to Quantum, to Red Hat.\n\nIn this episode we discuss:\n\n\n Ceph + Kubernetes = Rook\n The difficulty and importance of a stable storage solution for stateless applications\n How Rook leverages Kubernetes CRDs\n Why the Rook team decided to work with the CNCF\n Red Hat’s philosophy and approach to open-source\n\n\n\nLinks:\n\n\n Rook\n Red Hat\n Upbound\n Quantum\n CNCF\n\n\n\nPeople mentioned:\n\n\n Bassam Tabbara (@bassamtabbara)\n Jared Watts (@jbw976)\n\n","content_html":"

Eric Anderson (@ericmander) and Travis Nielsen (@STravisNielsen) talk about Rook, the open-source storage orchestrator for Kubernetes. Travis is a Senior Principal Software Engineer at Red Hat, and maintainer of Rook. Join us to dive deep into the story of Rook, from Microsoft, to Quantum, to Red Hat.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-06-16T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/2dfe72f0-36f3-4cc8-adb4-1b805101030d.mp3","mime_type":"audio/mpeg","size_in_bytes":31259629,"duration_in_seconds":1949}]},{"id":"fe1001a7-90b8-4dec-a55b-b8c7b4b00098","title":"Apache Cassandra with Patrick McFadin","url":"https://www.contributor.fyi/cassandra","content_text":"Eric Anderson (@ericmander) and Patrick McFadin (@PatrickMcFadin) delve into the history of Apache Cassandra, the open-source NoSQL database born and bred around cloud over a decade ago. Patrick is the VP of Developer Relations at DataStax, and a member of the Cassandra Project Management Committee. On today’s episode, Patrick shares his philosophy on developer advocacy and experience in open-source.\n\nIn this episode we discuss:\n\n\n Behind the NoSQL explosion that made Cassandra the darling of the valley\n Comparing different eras of commercializing open-source, then and now\n How Patrick became a pioneer in evangelizing and community-building\n The two kinds of people to recruit for developer relations\n Why Patrick says open-source is going to “start eating clouds”\n\n\nLinks:\n\n\n Apache Cassandra\n Datastax\n Datastax Astra\n\n\nPeople mentioned:\n\n\n Avinash Lakshman (@HedvigEng)\n Prashant Malik (@pmalik)\n Adrian Cawcroft (@adrianco)\n Kelsey Hightower (@kelseyhightower)\n\n\nOther episodes:\n\n\n Chef with Adam Jacob\n\n","content_html":"

Eric Anderson (@ericmander) and Patrick McFadin (@PatrickMcFadin) delve into the history of Apache Cassandra, the open-source NoSQL database born and bred around cloud over a decade ago. Patrick is the VP of Developer Relations at DataStax, and a member of the Cassandra Project Management Committee. On today’s episode, Patrick shares his philosophy on developer advocacy and experience in open-source.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2021-06-02T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/fe1001a7-90b8-4dec-a55b-b8c7b4b00098.mp3","mime_type":"audio/mpeg","size_in_bytes":33068975,"duration_in_seconds":2062}]},{"id":"b26c15f0-5b86-4ae5-97d9-88e275c9e6f0","title":"Dagster with Nick Schrock","url":"https://www.contributor.fyi/dagster","content_text":"Eric Anderson (@ericmander) interviews Nick Schrock (@schrockn) about Dagster, the open-source data orchestrator for machine learning, analytics, and ETL. Nick is the founder and CEO of Elementl, and is well-known for creating the Project Infrastructure group at Facebook, which spawned GraphQL and React. On today’s episode of Contributor, Nick explains how he set out to fix an inefficiency he identified amongst the complexity of the data infrastructure domain.\n\nIn this episode we discuss:\n\n\n Dagster’s place in the industry shift towards thinking of data as a software engineering discipline\n Why Nick believes it’s time for the term “data cleaning” to be retired\n The empowerment of Dagster’s instantaneous spin-up process and local development experience\n How a partner integrated Dagster into workflow for ops workers on the warehouse floor\n One user’s testimony that, “what dbt did for our SQL, Dagster did for our Python”\n\n\nLinks:\n\n\n Dagster\n Elementl\n GraphQL\n React\n dbt\n Snowflake\n Apache Airflow\n\n\nPeople mentioned:\n\n\n Lee Byron (@leeb)\n Dan Schafer (@dlschafer)\n Abe Gong (@AbeGong)\n\n","content_html":"

Eric Anderson (@ericmander) interviews Nick Schrock (@schrockn) about Dagster, the open-source data orchestrator for machine learning, analytics, and ETL. Nick is the founder and CEO of Elementl, and is well-known for creating the Project Infrastructure group at Facebook, which spawned GraphQL and React. On today’s episode of Contributor, Nick explains how he set out to fix an inefficiency he identified amongst the complexity of the data infrastructure domain.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-05-19T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/b26c15f0-5b86-4ae5-97d9-88e275c9e6f0.mp3","mime_type":"audio/mpeg","size_in_bytes":33645340,"duration_in_seconds":2098}]},{"id":"9e60cd57-ad83-415c-b684-5b7d6866d9f3","title":"Hasura with Tanmai Gopal","url":"https://www.contributor.fyi/hasura","content_text":"Eric Anderson (@ericmander) and Tanmai Gopal (@tanmaigo) dive into the open-source Hasura GraphQL Engine and the wider Hasura community. Hasura provides real-time GraphQL APIs for databases, so developers can focus on building applications without worrying about infrastructure. Tune in to hear the full story about how Tanmai and his team are helping engineers unlock the dream of self-serve data access.\n\nIn this episode we discuss:\n\n\n How the early Hasura team created their own version of GraphQL in parallel\n Developing community with ease of onboarding and radical transparency\n Transitioning community events into the COVID world, and looking to a future beyond travel\n Hasura’s secret sauce: the authorization framework\n\n\nLinks:\n\n\n Hasura\n Hasura Con’21\n DigitalOcean\n\n\nPeople mentioned:\n\n\n Rajoshi Ghosh (@rajoshighosh)\n\n","content_html":"

Eric Anderson (@ericmander) and Tanmai Gopal (@tanmaigo) dive into the open-source Hasura GraphQL Engine and the wider Hasura community. Hasura provides real-time GraphQL APIs for databases, so developers can focus on building applications without worrying about infrastructure. Tune in to hear the full story about how Tanmai and his team are helping engineers unlock the dream of self-serve data access.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-05-05T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/9e60cd57-ad83-415c-b684-5b7d6866d9f3.mp3","mime_type":"audio/mpeg","size_in_bytes":34818134,"duration_in_seconds":2172}]},{"id":"292551ed-63be-43f6-a892-1dc679229790","title":"MindsDB with Jorge Torres and Adam Carrigan","url":"https://www.contributor.fyi/mindsdb","content_text":"Eric Anderson (@ericmander) is joined by the co-founders of MindsDB, Jorge Torres (@JorgeTorresAI) and Adam Carrigan (@AdamMCarrigan). MindsDB is an open-source AI layer that integrates with existing databases, from MySQL to Clickhouse. Tune in to learn how these two former college roommates are working to bring machine learning into the mainstream.\n\nIn this episode we discuss:\n\n\n Why it makes sense to run machine learning models in the database\n Partnering with Kafka, Looker, and more\n MindsDB’s initial adoption by students at Berkeley\n Different applications for MindsDB and machine learning in ecommerce, finance, and more\n The moment Jorge knew he had to get into business with Adam\n\n\nLinks:\n\n\n MindsDB\n RedisConf 2021\n Looker\n Apache Kafka\n ClickHouse\n\n\nOther episodes\n\n\n ClickHouse with Alexey Milovidov and Ivan Blinkov\n","content_html":"

Eric Anderson (@ericmander) is joined by the co-founders of MindsDB, Jorge Torres (@JorgeTorresAI) and Adam Carrigan (@AdamMCarrigan). MindsDB is an open-source AI layer that integrates with existing databases, from MySQL to Clickhouse. Tune in to learn how these two former college roommates are working to bring machine learning into the mainstream.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes

\n\n","summary":"","date_published":"2021-04-21T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/292551ed-63be-43f6-a892-1dc679229790.mp3","mime_type":"audio/mpeg","size_in_bytes":28543313,"duration_in_seconds":1779}]},{"id":"7fc3209e-479d-46d5-8dd9-cbe33e6381ce","title":"Anaconda with Peter Wang","url":"https://www.contributor.fyi/anaconda","content_text":"Eric Anderson (@ericmander) welcomes Peter Wang (@pwang) for a conversation about the Python ecosystem and the open-source communities that have built it. Peter is the creator of Anaconda, the near-essential Python distribution for scientific computing that makes managing packages a lot more manageable. In today’s episode, Peter offers a unique and powerful perspective on how to make the economics of open-source work for everyone.\n\nIn this episode we discuss:\n\n\n The paradox of the PVM and Python’s packaging difficulties\n How Guido van Rossum implied permission for Anaconda and the open-source Python movement\n Python as the lingua franca of a new professional class\n Looking to Roblox for inspiration for a scientific computing creator community\n Giving back to open-source communities through the NumFOCUS Foundation\n\n\nLinks:\n\n\n Anaconda\n NumFOCUS\n NumPy\n SciPy\n Enthought \n Jupyter\n TensorFlow\n MicroPython\n scikit-learn\n pandas\n Quansight\n Red Hat\n Roblox\n\n\nPeople mentioned:\n\n\n Travis Oliphant (@teoliphant)\n Fernando Pérez (@fperez_org)\n Brian Granger (@ellisonbg)\n Min Ragan-Kelley (@minrk)\n Guido van Rossum (@gvanrossum)\n James Currier (@JamesCurrier)\n\n\nOther episodes:\n\n\n NumPy & SciPy with Travis Oliphant\n TensorFlow with Rajat Monga\n\n\n","content_html":"

Eric Anderson (@ericmander) welcomes Peter Wang (@pwang) for a conversation about the Python ecosystem and the open-source communities that have built it. Peter is the creator of Anaconda, the near-essential Python distribution for scientific computing that makes managing packages a lot more manageable. In today’s episode, Peter offers a unique and powerful perspective on how to make the economics of open-source work for everyone.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2021-04-07T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/7fc3209e-479d-46d5-8dd9-cbe33e6381ce.mp3","mime_type":"audio/mpeg","size_in_bytes":33341066,"duration_in_seconds":2079}]},{"id":"503e0a07-1982-4f13-a5fa-82f00a73c2c6","title":"Redpanda with Alexander Gallego","url":"https://www.contributor.fyi/redpanda","content_text":"Eric Anderson (@ericmander) is joined by Alexander Gallego (@emaxerrno) for an examination of Redpanda, the source available event streaming platform designed as a drop-in replacement for Kafka. Redpanda’s storage engine is attractive to developers for its performance and simplicity, removing the complexity of running Kafka to scale and deploying with a single binary. Listen to today’s episode to learn more about how Alexander and the team at Vectorized are looking to advance the conversation around streaming into the future.\n\nIn this episode we discuss:\n\n\n What Alexander means when he says that hardware is the platform for data streaming\n The 3 things that turn a data stream into a data product\n Comparing Redpanda to Kafka and Pulsar\n A difference in product philosophy between selling to data teams vs app developers\n How Alexander approached the challenge of monetizing data infrastructure\n\n\n\nLinks:\n\n\n Redpanda\n Vectorized\n Apache Kafka\n Apache Pulsar\n Apache Spark\n Apache Beam\n Apache Storm\n Apache Flink\n Elastic\n CockroachDB \n\n\n\nOther episodes:\n\n\n TensorFlow with Rajat Monga\n Scylla with Dor Laor\n\n","content_html":"

Eric Anderson (@ericmander) is joined by Alexander Gallego (@emaxerrno) for an examination of Redpanda, the source available event streaming platform designed as a drop-in replacement for Kafka. Redpanda’s storage engine is attractive to developers for its performance and simplicity, removing the complexity of running Kafka to scale and deploying with a single binary. Listen to today’s episode to learn more about how Alexander and the team at Vectorized are looking to advance the conversation around streaming into the future.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2021-03-24T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/503e0a07-1982-4f13-a5fa-82f00a73c2c6.mp3","mime_type":"audio/mpeg","size_in_bytes":32405673,"duration_in_seconds":2021}]},{"id":"f0a1795a-18fb-4e22-87c2-801a41dda07a","title":"Storybook with Zoltan Olah","url":"https://www.contributor.fyi/storybook","content_text":"Eric Anderson (@ericmander) and Zoltan Olah (@zqzoltan) discuss Storybook, the open-source UI component development tool. Storybook supports all the most popular frontend frameworks and libraries such as React, Vue and Angular, but allows users to test and develop components in isolation. In today’s episode, learn more about the early days of the component-driven development methodology and how Storybook was saved by a passionate community of engineers.\n\nIn this episode we discuss:\n\n\n Storybook as an integral part of UI design workflow\n How Zoltan and his team inherited Storybook and saved it from being “left out to dry”\n Solving a pain point for front-end engineers with Chromatic’s UI regression testing, built on top of Storybook\n Why Zoltan compares components to APIs, and Storybook to a service mesh\n What’s happening today in the world of open-source design systems\n\n\nLinks:\n\n\n Storybook\n Chromatic\n Meteor\n GraphQL\n React\n Tailwind\n Selenium\n Cypress\n Material-UI\n Figma\n Learn Storybook\n\n\n\nPeople mentioned:\n\n\n Dominic Nguyen (@domyen)\n Tom Coleman (@tmeasday)\n Arunoda Susiripala (@arunoda)\n Norbert de Langen (@NorbertdeLangen)\n Michael Shilman (@mshilman)\n\n\n","content_html":"

Eric Anderson (@ericmander) and Zoltan Olah (@zqzoltan) discuss Storybook, the open-source UI component development tool. Storybook supports all the most popular frontend frameworks and libraries such as React, Vue and Angular, but allows users to test and develop components in isolation. In today’s episode, learn more about the early days of the component-driven development methodology and how Storybook was saved by a passionate community of engineers.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-03-10T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/f0a1795a-18fb-4e22-87c2-801a41dda07a.mp3","mime_type":"audio/mpeg","size_in_bytes":26561768,"duration_in_seconds":1656}]},{"id":"08828544-3a16-441d-84b3-ad0a9d5ca3b5","title":"SkyWalking with Sheng Wu","url":"https://www.contributor.fyi/skywalking","content_text":"Eric Anderson (@ericmander) and Sheng Wu (@wusheng1108) discuss Apache SkyWalking, an open-source APM tool focusing on cloud-native and distributed systems. SkyWalking was originally developed in 2012 as a training tool for developers new to distributed systems architecture, but it became Sheng’s pet project for several years until he brought it to the Apache Incubator program. Listen to today’s episode for the inside scoop of how this “hidden gem” fits into the Apache network of open-source software projects.\n\nIn this episode we discuss:\n\n\n Why open-source APMs are not very common\n SkyWalking’s focus on attracting more contributors rather than users\n How a conflict of interest at Huawei led to a “bake-off” between Apache and CNCF\n The impact of Elastic changing their license on the open-source community\n The name “Skywalking,” its sources of inspiration, and an easter egg\n\n\nLinks:\n\n\n Apache SkyWalking\n Kubernetes\n The Apache Incubator\n CNCF\n Tetrate\n Apache ShardingSphere\n Apache APISIX\n Envoy Proxy\n Apache Airflow\n Apache Beam\n Dynatrace\n New Relic\n Elastic\n Helm\n Zipkin\n\n\nOther episodes:\n\n\n Envoy Proxy with Matt Klein\n\n","content_html":"

Eric Anderson (@ericmander) and Sheng Wu (@wusheng1108) discuss Apache SkyWalking, an open-source APM tool focusing on cloud-native and distributed systems. SkyWalking was originally developed in 2012 as a training tool for developers new to distributed systems architecture, but it became Sheng’s pet project for several years until he brought it to the Apache Incubator program. Listen to today’s episode for the inside scoop of how this “hidden gem” fits into the Apache network of open-source software projects.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2021-02-24T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/08828544-3a16-441d-84b3-ad0a9d5ca3b5.mp3","mime_type":"audio/mpeg","size_in_bytes":30825788,"duration_in_seconds":1922}]},{"id":"b9ea163f-164b-4dcc-98a8-4e4dc7acbbc9","title":"Snowpack with Fred K. Schott","url":"https://www.contributor.fyi/snowpack","content_text":"Eric Anderson (@ericmander) and Fred K. Schott (@FredKSchott) dive into the world of Snowpack, an open-source, frontend build tool for web developers. Snowpack is special because it uses Javascript’s ES module system to instantly write file changes to the browser. Fred created Snowpack and the Skypack CDN to fulfill his vision of the future of the web, which he first recognized while trying to advance the Javascript ecosystem with an earlier project called Pika. On today’s episode, find out how Fred rejected the pain of modern web development, and came up with a better solution.\n\nIn this episode we discuss:\n\n\n Reconfiguring old ideas for today’s web development landscape\n How Snowpack and Skypack lighten the load when it comes to Node modules and storage space\n Questioning what it means to build a modern application that works for developers and users alike\n Skypack and the future of shared dependencies across different sites\n Why Snowpack is using an open governance framework\n\n\nLinks:\n\n\n Snowpack\n Skypack\n OCTO Speaker Series - Fred K. Schott\n Svelte\n React\n Ripple\n Microsite\n Deno\n Next.js\n esbuild\n webpack\n\n\nPeople mentioned:\n\n\n Rich Harris (@Rich_Harris)\n Nate Moore (@n_moore)\n\n","content_html":"

Eric Anderson (@ericmander) and Fred K. Schott (@FredKSchott) dive into the world of Snowpack, an open-source, frontend build tool for web developers. Snowpack is special because it uses Javascript’s ES module system to instantly write file changes to the browser. Fred created Snowpack and the Skypack CDN to fulfill his vision of the future of the web, which he first recognized while trying to advance the Javascript ecosystem with an earlier project called Pika. On today’s episode, find out how Fred rejected the pain of modern web development, and came up with a better solution.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-02-10T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/b9ea163f-164b-4dcc-98a8-4e4dc7acbbc9.mp3","mime_type":"audio/mpeg","size_in_bytes":30384841,"duration_in_seconds":1894}]},{"id":"324ec39b-9536-41f9-b6a8-ac58d824829d","title":"NumPy & SciPy with Travis Oliphant","url":"https://www.contributor.fyi/numpy-scipy","content_text":"Eric Anderson (@ericmander) and Travis Oliphant (@teoliphant) take a far-reaching tour through the history of the Python data community. Travis has had a hand in the creation of many open-source projects, most notably the influential libraries, NumPy and SciPy, which helped cement Python as the standard for scientific computing. Join us for the story of a fledgling community from a time “before open-source was cool,” and their lessons for today’s open-source landscape.\n\nIn this episode we discuss:\n\n\n How biomedical engineering, MRIs, and an unhappy tenure committee led to NumPy and SciPy\n Overcoming early challenges of distribution with Python\n What Travis would have done differently when he wrote NumPy\n Successfully solving the “two-option split” by adding a third option\n Community-driven open-source interacting with company-backed open-source\n\n\n\nLinks:\n\n\n NumPy\n SciPy\n Anaconda\n Quansight\n Conda\n Matplotlib\n Enthought\n TensorFlow\n PyTorch\n MXNet\n PyPi\n Jupyter\n pandas\n\n\n\nPeople mentioned:\n\n\n Guido van Rossum (@gvanrossum)\n Robert Kern (Github: @rkern)\n Pearu Peterson (Github: @pearu)\n Wes McKinney (@wesmckinn)\n Charles Harris (Github: @charris)\n Francesc Alted (@francescalted)\n Fernando Perez (@fperez_org)\n Brian Granger (@ellisonbg)\n\n\n\nOther episodes:\n\n\n TensorFlow with Rajat Monga\n\n","content_html":"

Eric Anderson (@ericmander) and Travis Oliphant (@teoliphant) take a far-reaching tour through the history of the Python data community. Travis has had a hand in the creation of many open-source projects, most notably the influential libraries, NumPy and SciPy, which helped cement Python as the standard for scientific computing. Join us for the story of a fledgling community from a time “before open-source was cool,” and their lessons for today’s open-source landscape.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2021-01-27T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/324ec39b-9536-41f9-b6a8-ac58d824829d.mp3","mime_type":"audio/mpeg","size_in_bytes":47601833,"duration_in_seconds":2971}]},{"id":"3584036d-67e0-4217-a4f0-84dfeab1fdce","title":"Scylla with Dor Laor","url":"https://www.contributor.fyi/scylla","content_text":"Eric Anderson (@ericmander) and Dor Laor (@DorLaor) go under the hood of Scylla, the open-source NoSQL database designed for low latency and high throughput in big data applications. Dor and his team have reimplemented Apache Cassandra in C++ from scratch, with additional compatibility for DynamoDB. In today’s episode, Dor shares details on the exciting work coming out of ScyllaDB, including Seastar, their open-source C++ framework. Also, check out Scylla Summit 2021 to learn what’s next for Scylla.\n\nIn this episode we discuss:\n\n\n Enabling Scylla to “gain control” by implementing Apache Cassandra in C++\n How Dor and his co-founder were ahead of the curve with their vision for virtualization\n Scylla’s unique shard-per-core architecture\n Working with distributed teams, even before the COVID-19 pandemic\n The growing significance of separating the interface from the engine in open-source\n Learn about Project Circe, which is being featured at Scylla Summit 2021 right now\n\n\nLinks:\n\n\n Scylla\n Seastar\n Scylla Summit 2021\n Apache Cassandra\n DynamoDB\n MongoDB\n Redhat\n QEMU\n Redis\n Vectorized\n Apache Hadoop\n Apache HBase\n Apache Beam\n Apache Flink\n Apache Spark\n\n\n\nPeople mentioned:\n\n\n Avi Kivity (@AviKivity)\n\n","content_html":"

Eric Anderson (@ericmander) and Dor Laor (@DorLaor) go under the hood of Scylla, the open-source NoSQL database designed for low latency and high throughput in big data applications. Dor and his team have reimplemented Apache Cassandra in C++ from scratch, with additional compatibility for DynamoDB. In today’s episode, Dor shares details on the exciting work coming out of ScyllaDB, including Seastar, their open-source C++ framework. Also, check out Scylla Summit 2021 to learn what’s next for Scylla.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2021-01-13T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/3584036d-67e0-4217-a4f0-84dfeab1fdce.mp3","mime_type":"audio/mpeg","size_in_bytes":30584625,"duration_in_seconds":1907}]},{"id":"24626f5b-1660-413e-a72f-edfa296c38e3","title":"Gitpod with Sven Efftinge, Christian Weichel and Gero Posmyk-Leinemann","url":"https://www.contributor.fyi/gitpod","content_text":"Eric Anderson (@ericmander) chats with Sven Efftinge (@svenefftinge), Christian Weichel (@csweichel) and Gero Posmyk-Leinemann (Github: @geropl) about their work on Gitpod, an open-source Kubernetes application that allows engineers to spin up a server-side dev-environment from a Git repository, all within their browser. The three team members are part of TypeFox, a consulting firm that specialized in developer tools for different companies before branching out into open-source projects. Upon Gero’s hiring at TypeFox, he was tasked with creating a minimum viable product for the idea that would eventually become Gitpod. Tune in to hear how shifting from consulting to working on their own open-source projects was a breath of fresh air for the developers at TypeFox.\n\nIn this episode we discuss:\n\n\n How Gitpod solves the problem of switching between multiple dev environments, and improves deep code review\n The trap that many open-source founders fall into\n Why TypeFox wanted to switch from a consulting firm to a product shop\n Details on how Gitpod handles licensing\n Learn how you can instantly try out a Gitpod environment for any existing Github repository\n\n\nLinks:\n\n\n Gitpod\n TypeFox\n Theia\n Kubernetes\n\n\nPeople mentioned:\n\n\n Anton Kosyakov (@akosyakov)\n Sid Sijbrandij (@sytses)\n\n","content_html":"

Eric Anderson (@ericmander) chats with Sven Efftinge (@svenefftinge), Christian Weichel (@csweichel) and Gero Posmyk-Leinemann (Github: @geropl) about their work on Gitpod, an open-source Kubernetes application that allows engineers to spin up a server-side dev-environment from a Git repository, all within their browser. The three team members are part of TypeFox, a consulting firm that specialized in developer tools for different companies before branching out into open-source projects. Upon Gero’s hiring at TypeFox, he was tasked with creating a minimum viable product for the idea that would eventually become Gitpod. Tune in to hear how shifting from consulting to working on their own open-source projects was a breath of fresh air for the developers at TypeFox.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-12-30T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/24626f5b-1660-413e-a72f-edfa296c38e3.mp3","mime_type":"audio/mpeg","size_in_bytes":25058787,"duration_in_seconds":1562}]},{"id":"a6d7764f-373f-4d38-bf48-6be308abbdcf","title":"oso with Graham Neray","url":"https://www.contributor.fyi/oso","content_text":"Eric Anderson (@ericmander) interviews Graham Neray (@grahamneray) about oso, the open-source policy engine for authorization. oso was originally born from a desire to make infrastructure and security easier for developers, which is why Graham and his company describe themselves as being in the “friction-removal business.” Listen to today’s episode to learn how the team at oso are working to put security in the hands of developers. \n\nIn this episode we discuss:\n\n\n Developers building RBAC (role-based access control) systems over and over again\n Why open-source is the best way to handle authorization logic\n The history behind oso’s core policy language, Polar\n How someone beat Graham to the punch submitting oso to a Python newsletter\n Comparing oso and OPA (Open Policy Agent)\n\n\nLinks:\n\n\n oso\n Stripe\n Trulioo\n MongoDB\n Auth0\n Show HN\n OPA\n Polar Adventure\n\n\nPeople mentioned:\n\n\n Sam Scott (@samososos)\n Alex Plotnick (Github: @plotnick)\n Stephen Olsen (@olsenator4)\n\n\nOther episodes:\n\n\n Presto on Contributor\n OPA on Contributor\n","content_html":"

Eric Anderson (@ericmander) interviews Graham Neray (@grahamneray) about oso, the open-source policy engine for authorization. oso was originally born from a desire to make infrastructure and security easier for developers, which is why Graham and his company describe themselves as being in the “friction-removal business.” Listen to today’s episode to learn how the team at oso are working to put security in the hands of developers.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2020-12-16T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/a6d7764f-373f-4d38-bf48-6be308abbdcf.mp3","mime_type":"audio/mpeg","size_in_bytes":27666852,"duration_in_seconds":1725}]},{"id":"14161242-4699-4bde-982a-326a10da2961","title":"TensorFlow with Rajat Monga","url":"https://www.contributor.fyi/tensorflow","content_text":"Eric Anderson (@ericmander) is joined by Rajat Monga (@rajatmonga), a co-creator of TensorFlow. Originally developed by the Google Brain team, TensorFlow is now one of the most popular open-source libraries for machine learning. The team at TensorFlow seek to “democratize” the world of AI as we know it, and by all accounts, they are succeeding. Listen to today’s episode to get inside one of the largest and most exciting open-source projects of the decade.\n\nIn this episode we discuss:\n\n\n How TensorFlow compares to other open-source projects at Google\n Taking bets on launch day numbers\n Balancing the demands of different kinds of TensorFlow users\n Lessons from Keras and PyTorch\n\n\nLinks:\n\n\n TensorFlow\n Keras \n PyTorch\n Kafka\n Kubernetes\n MapReduce: Simplified Data Processing on Large Clusters\n Bigtable: A Distributed Storage System for Structured Data\n\n\nPeople mentioned:\n\n\n Jeff Dean (@JeffDean)\n Andrew Ng (@AndrewYNg)\n François Chollet (@fchollet)\n","content_html":"

Eric Anderson (@ericmander) is joined by Rajat Monga (@rajatmonga), a co-creator of TensorFlow. Originally developed by the Google Brain team, TensorFlow is now one of the most popular open-source libraries for machine learning. The team at TensorFlow seek to “democratize” the world of AI as we know it, and by all accounts, they are succeeding. Listen to today’s episode to get inside one of the largest and most exciting open-source projects of the decade.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-12-02T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/14161242-4699-4bde-982a-326a10da2961.mp3","mime_type":"audio/mpeg","size_in_bytes":28553761,"duration_in_seconds":1780}]},{"id":"8677b8e6-8514-40ca-9d32-bb1102b4ac4d","title":"Materialize with Frank McSherry","url":"https://www.contributor.fyi/materialize","content_text":"Eric Anderson (@ericmander) and Frank McSherry (@frankmcsherry) dive into Materialize, a source-available streaming database that lets engineers build real-time applications. Frank is a data processing expert whose work at Microsoft Research on the Timely and Differential Dataflow models culminated in the Materialize project. Tune in to today’s episode to learn how the team at Materialize are making the technology from cutting-edge data research accessible to a wider swath of users.\n\nIn this episode we discuss:\n\n\n Sharing early ideas with an “academic open source” approach\n How Materialize made a commitment to correctness\n Frank’s developmental philosophy of iterative thinking\n Novel applications for the Materialize community\n Changing the way we approach problems with real-time data processing\n\n\n\nLinks:\n\n\n Materialize\n Naiad: A Timely Dataflow System\n DryadLINQ\n Apache Arrow\n\n\n\nPeople mentioned:\n\n\n Arjun Narayan (@narayanarjun)\n Derek Murray (@mrry)\n\n","content_html":"

Eric Anderson (@ericmander) and Frank McSherry (@frankmcsherry) dive into Materialize, a source-available streaming database that lets engineers build real-time applications. Frank is a data processing expert whose work at Microsoft Research on the Timely and Differential Dataflow models culminated in the Materialize project. Tune in to today’s episode to learn how the team at Materialize are making the technology from cutting-edge data research accessible to a wider swath of users.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-11-18T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/8677b8e6-8514-40ca-9d32-bb1102b4ac4d.mp3","mime_type":"audio/mpeg","size_in_bytes":34956896,"duration_in_seconds":2180}]},{"id":"3a82cea8-9308-41e3-a2a5-ec79a2311eb2","title":"Cilium with Thomas Graf","url":"https://www.contributor.fyi/cilium","content_text":"Eric Anderson (@ericmander) speaks with Thomas Graf (@tgraf__) about Cilium, the open-source networking, observability, and security software for cloud-native applications based on eBPF. Thomas is the co-founder and CTO of Isovalent, which maintains both eBPF and Cilium. Listen to today’s episode for a discussion of how Thomas’ work has leveled up the Linux kernel and the possibilities of network infrastructure in a cloud-native world.\n\nIn this episode we discuss:\n\n\n The impact of simultaneous development on Cilium and eBPF\n Google’s incorporation of Cilium\n Shortening the gap between writing kernel code and its deployment\n What JavaScript and eBPF have in common\n Cilium’s sister project, Hubble\n\n\nLinks:\n\n\n Cilium\n eBPF\n Isovalent\n Red Hat\n OpenShift\n Kubernetes\n Docker\n New GKE Dataplane V2 increases security and visibility for containers\n SPIFFE\n Istio\n\n\nPeople mentioned:\n\n\n Brendan Gregg (@brendangregg)\n\n\nOther episodes:\n\n\n Istio on Contributor\n\n","content_html":"

Eric Anderson (@ericmander) speaks with Thomas Graf (@tgraf__) about Cilium, the open-source networking, observability, and security software for cloud-native applications based on eBPF. Thomas is the co-founder and CTO of Isovalent, which maintains both eBPF and Cilium. Listen to today’s episode for a discussion of how Thomas’ work has leveled up the Linux kernel and the possibilities of network infrastructure in a cloud-native world.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2020-11-04T02:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/3a82cea8-9308-41e3-a2a5-ec79a2311eb2.mp3","mime_type":"audio/mpeg","size_in_bytes":28368606,"duration_in_seconds":1768}]},{"id":"79950fff-d132-4bf7-b16f-636dba319fc6","title":"Prefect with Jeremiah Lowin","url":"https://www.contributor.fyi/prefect","content_text":"Eric Anderson (@ericmander) and Jeremiah Lowin (@jlowin) discuss Prefect, a workflow management system and data orchestration tool under development as an open-source project. Jeremiah initially created Prefect to solve a technical challenge specific to his own work, but soon realized that it was appealing to a very wide range of different clients. Listen to today’s episode to learn why Jeremiah believes most attempts to build a unified framework for solving data orchestration fail.\n\nIn this episode we discuss:\n\n\n Solving the “negative engineering problem”\n Learning from the complaints of data engineers at Apache Airflow\n The difficulty of having a product that serves two masters\n How COVID changed the direction of Prefect\n\n\nLinks:\n\n\n Prefect\n Apache Airflow\n Why Not Airflow?\n\n\nPeople mentioned:\n\n\n Jim O'Shaughnessy (@jposhaughnessy)\n Patrick O’Shaughnessy (@patrick_oshag)\n\n","content_html":"

Eric Anderson (@ericmander) and Jeremiah Lowin (@jlowin) discuss Prefect, a workflow management system and data orchestration tool under development as an open-source project. Jeremiah initially created Prefect to solve a technical challenge specific to his own work, but soon realized that it was appealing to a very wide range of different clients. Listen to today’s episode to learn why Jeremiah believes most attempts to build a unified framework for solving data orchestration fail.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-10-21T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/79950fff-d132-4bf7-b16f-636dba319fc6.mp3","mime_type":"audio/mpeg","size_in_bytes":47093595,"duration_in_seconds":2939}]},{"id":"1beb3d80-5aa9-43c3-af1c-3cac3ac6f42b","title":"Open Policy Agent with Torin Sandall","url":"https://www.contributor.fyi/opa","content_text":"Eric Anderson (@ericmander) catches up with Torin Sandall (@sometorin), co-creator of Open Policy Agent (OPA), the open-source, general-purpose policy engine. By focusing on demonstrating OPA’s value through case studies, targeted interviews, and word-of-mouth, Torin and the folks at Styra were able to grow OPA into the emerging standard for unified policy enforcement across the cloud-native stack.\n\nIn this episode we discuss:\n\n\n When Netflix stumbled across OPA and delivered its “Cinderella moment”\n Why OPA was designed to be developer-centric\n The value of demonstrating OPA’s use cases to the industry\n How one user created an RPG engine with OPA\n\n\nLinks:\n\n\n Open Policy Agent\n Styra\n OpenStack\n LinkerD\n Hacker News\n Kubernetes\n KubeCon\n OPA Gatekeeper\n conftest\n Corrupting the Open Policy Agent to Run My Games\n Envoy\n Styra Academy\n\n\nPeople mentioned:\n\n\n Tim Hinrichs (@tlhinrchs)\n William Morgan (@wm)\n Kevin Hoffman (@kevinhoffman)\n\n\nOther episodes:\n\n\n LinkerD on Contributor\n Envoy on Contributor\n\n","content_html":"

Eric Anderson (@ericmander) catches up with Torin Sandall (@sometorin), co-creator of Open Policy Agent (OPA), the open-source, general-purpose policy engine. By focusing on demonstrating OPA’s value through case studies, targeted interviews, and word-of-mouth, Torin and the folks at Styra were able to grow OPA into the emerging standard for unified policy enforcement across the cloud-native stack.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n\n\n

Other episodes:

\n\n","summary":"","date_published":"2020-10-07T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/1beb3d80-5aa9-43c3-af1c-3cac3ac6f42b.mp3","mime_type":"audio/mpeg","size_in_bytes":33172210,"duration_in_seconds":2069}]},{"id":"cc8680e5-67d9-42a9-b96b-b5e8fc723cf5","title":"Temporal with Maxim Fateev","url":"https://www.contributor.fyi/temporal","content_text":"Eric Anderson (@ericmander) and Maxim Fateev (@mfateev) trace the development of Temporal, an open-source workflow orchestration engine. At Uber, Maxim co-created the project’s predecessor, Cadence, but Temporal’s roots stretch farther back to include lessons learned at Amazon and Microsoft. In this episode, learn how 18 years of experience in asynchronous messaging and workflows culminated in the foundation of Temporal.\n\nIn this episode we discuss:\n\n\n Why Maxim quit Uber to start his own company\n Differences between Temporal and Cadence\n How Uber is filling the position that Google once had incubating open-source projects\n Maxim’s advice for aspiring open-source founders\n\n\n\nRelated Links:\n\n\n Temporal\n Cadence\n Kafka\n HashiCorp\n BanzaiCloud\n Hacker News\n Andreesen Horowitz\n TChannel\n Hadoop\n\n\n\nPeople mentioned:\n\n\n Samar Abbas (@samarabbas77)\n\n","content_html":"

Eric Anderson (@ericmander) and Maxim Fateev (@mfateev) trace the development of Temporal, an open-source workflow orchestration engine. At Uber, Maxim co-created the project’s predecessor, Cadence, but Temporal’s roots stretch farther back to include lessons learned at Amazon and Microsoft. In this episode, learn how 18 years of experience in asynchronous messaging and workflows culminated in the foundation of Temporal.

\n\n

In this episode we discuss:

\n\n\n\n

Related Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-09-23T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/cc8680e5-67d9-42a9-b96b-b5e8fc723cf5.mp3","mime_type":"audio/mpeg","size_in_bytes":29474944,"duration_in_seconds":1838}]},{"id":"a6d955d1-1a02-4e0d-9c49-31fc3a3c35e6","title":"Dgraph with Manish Jain","url":"https://www.contributor.fyi/dgraph","content_text":"Eric Anderson (@ericmander) and Manish Jain (@manishrjain) discuss the impact of Dgraph, an open-source database with a graph backend that Manish describes as “a search engine acting as a database.” Manish took a gamble when he chose GraphQL as his project’s query language shortly after its release by Facebook in 2015. Now, GraphQL has grown immensely in popularity and the bet has paid off, as Dgraph leads the cutting edge of databases in this new space. Make sure to check out the Dgraph team’s conference, “GraphQL In Space,” which will be held virtually on September 10th at graphqlcon.space.\n\nIn this episode we discuss:\n\n\n How Manish was ahead of the curve at Google\n The chance circumstances in the Australian job market that led to Dgraph\n Building trust between open-source developers and their community\n Why the Dgraph team decided to hold their upcoming conference “In Space”\n The future of databases and GraphQL\n\n\nRelated Links:\n\n\n Dgraph\n GraphQL In Space\n GraphQL\n Badger\n MongoDB\n BigTable\n Cassandra\n Spanner\n Elasticsearch\n\n\nPeople mentioned:\n\n\n Scott Kelly (@StationCDRKelly)\n\n\n","content_html":"

Eric Anderson (@ericmander) and Manish Jain (@manishrjain) discuss the impact of Dgraph, an open-source database with a graph backend that Manish describes as “a search engine acting as a database.” Manish took a gamble when he chose GraphQL as his project’s query language shortly after its release by Facebook in 2015. Now, GraphQL has grown immensely in popularity and the bet has paid off, as Dgraph leads the cutting edge of databases in this new space. Make sure to check out the Dgraph team’s conference, “GraphQL In Space,” which will be held virtually on September 10th at graphqlcon.space.

\n\n

In this episode we discuss:

\n\n\n\n

Related Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-09-09T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/a6d955d1-1a02-4e0d-9c49-31fc3a3c35e6.mp3","mime_type":"audio/mpeg","size_in_bytes":28923237,"duration_in_seconds":1803}]},{"id":"ef7f14e0-68dc-4865-876e-faa64ac11c3c","title":"Presto with Martin Traverso, Dain Sundstrom and David Phillips","url":"https://www.contributor.fyi/presto","content_text":"Eric Anderson (@ericmander) talks to Martin Traverso (@mtraverso), Dain Sundstrom (@daindumb) and David Phillips (@electrum32) about their collaboration on Presto, an open-source distributed SQL query engine for big data. The three engineers worked together at three different companies before deciding to solve an efficiency problem for data analytics at Facebook in 2012. Listen to today’s episode to learn about the careful planning and technical philosophy behind the development and design of Presto.\n\nIn this episode we discuss:\n\n\n Starting an open-source project at Facebook in the early 2010s\n The importance of making Presto “dirt simple to install”\n What is “documentation driven development”\n Bootstrapping the growth of an open-source community\n How a single query caused a brownout across Facebook infrastructure\n\n\n\nRelated Links:\n\n\n Presto\n Starburst\n Ning\n Netezza\n ProofPoint\n Hadoop\n Postgres\n Hive\n OpenCompute\n @Scale\n Arm Treasure Data\n Qubole\n\n\n\nPeople mentioned:\n\n\n Jay Parikh (@jayparikh)\n\n","content_html":"

Eric Anderson (@ericmander) talks to Martin Traverso (@mtraverso), Dain Sundstrom (@daindumb) and David Phillips (@electrum32) about their collaboration on Presto, an open-source distributed SQL query engine for big data. The three engineers worked together at three different companies before deciding to solve an efficiency problem for data analytics at Facebook in 2012. Listen to today’s episode to learn about the careful planning and technical philosophy behind the development and design of Presto.

\n\n

In this episode we discuss:

\n\n\n\n

Related Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-08-26T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/ef7f14e0-68dc-4865-876e-faa64ac11c3c.mp3","mime_type":"audio/mpeg","size_in_bytes":37017017,"duration_in_seconds":2309}]},{"id":"5ae13532-11c8-46c8-82eb-6d3e11905e11","title":"Xanadu with Nathan Killoran","url":"https://www.contributor.fyi/xanadu","content_text":"Nathan Killoran (@co9olguy) guides Eric Anderson (@ericmander) through the cutting-edge world of quantum machine learning at Xanadu, a quantum computing company that is innovating with its use of photonics. Nathan is Xanadu’s Head of Software, Algorithms, & Quantum Machine Learning, and has detailed insight on their main open-source software projects, StrawberryFields and PennyLane. On today’s episode, Nathan explains how the barrier to contributing may be lower than you think, even if you don’t have a PhD in quantum physics.\n\nIn this episode we discuss:\n\n\n Designing software for Xanadu’s unique approach to quantum computing\n Machine learning, differentiable programming and more in the quantum domain\n How even high school students can contribute to an open-source quantum computing project\n Is there a road map for quantum machine learning?\n Nathan’s “blue sky” interview questions\n\n\nLinks:\n\n\n Xanadu\n StrawberryFields\n PennyLane\n ProjectQ\n TensorFlow Quantum\n PyTorch\n Qiskit\n Pyquil\n Cirq\n Alpine Quantum Technologies\n Quantum Open Source Foundation\n Unitary Fund\n\n\nPeople mentioned:\n\n\n Christian Weedbrook, CEO of Xanadu (@_cweedbrook)\n\n","content_html":"

Nathan Killoran (@co9olguy) guides Eric Anderson (@ericmander) through the cutting-edge world of quantum machine learning at Xanadu, a quantum computing company that is innovating with its use of photonics. Nathan is Xanadu’s Head of Software, Algorithms, & Quantum Machine Learning, and has detailed insight on their main open-source software projects, StrawberryFields and PennyLane. On today’s episode, Nathan explains how the barrier to contributing may be lower than you think, even if you don’t have a PhD in quantum physics.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n\n\n

People mentioned:

\n\n","summary":"","date_published":"2020-08-12T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/5ae13532-11c8-46c8-82eb-6d3e11905e11.mp3","mime_type":"audio/mpeg","size_in_bytes":27289853,"duration_in_seconds":1701}]},{"id":"f3bf70d3-06cb-4c25-ad97-a1e01bf039d8","title":"Clickhouse with Alexey Milovidov and Ivan Blinkov","url":"https://www.contributor.fyi/clickhouse","content_text":"Eric Anderson (@ericmander) talks to Alexey Milovidov (@alexey-milovidov) and Ivan Blinkov (@blinkov) about their work on Clickhouse, an open source analytical database from the team at Yandex. Originally designed to support Yandex.Metrica, word of this powerful tool spread rapidly inside the company, and the idea was hatched to make Clickhouse into a truly open source project. Tune in to learn about how Alexey petitioned management to accept what initially seemed like a “crazy” idea - and how the risk paid off.\n\n\n\nIn this episode we discuss:\n\n\n Differences between Clickhouse and similar products\n Why some open source projects are more successful than others\n The history of open source at Yandex\n What makes a good open source developer\n Building an international community\n\n\n\nLinks:\n\n\n Clickhouse\n Yandex.Metrica\n Altinity\n Postgres\n Oracle\n Infobright\n InfinityDB\n MongoDB\n Vertica\n Dremel: Interactive Analysis of Web-Scale Datasets (2010)\n CatBoost\n BEM\n Presto\n Druid\n Greenplum\n Apache Spark\n\n","content_html":"

Eric Anderson (@ericmander) talks to Alexey Milovidov (@alexey-milovidov) and Ivan Blinkov (@blinkov) about their work on Clickhouse, an open source analytical database from the team at Yandex. Originally designed to support Yandex.Metrica, word of this powerful tool spread rapidly inside the company, and the idea was hatched to make Clickhouse into a truly open source project. Tune in to learn about how Alexey petitioned management to accept what initially seemed like a “crazy” idea - and how the risk paid off.

\n\n


\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2020-07-29T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/f3bf70d3-06cb-4c25-ad97-a1e01bf039d8.mp3","mime_type":"audio/mpeg","size_in_bytes":40547518,"duration_in_seconds":2530}]},{"id":"3015ccfb-f115-49b3-9364-10e3478b5336","title":"LinkerD with William Morgan","url":"https://www.contributor.fyi/linkerd","content_text":"Eric Anderson (@ericmander) chats with William Morgan (@wm), CEO of Buoyant and a creator of the open source service mesh, LinkerD. As a former infrastructure engineer at Twitter, William leveraged his experience there to help develop what would become effectively the first service mesh. Listen to today’s episode to find out how the team at Buoyant originally coined the term, and are continuing to define the concept today.\n\nIn this episode we discuss:\n\n\n Pioneering the very first service mesh\n Why Buoyant rejected the open core model\n How the industry is shifting away from the “nights and weekends” community\n Rewriting LinkerD from scratch\n\n\nLinks:\n\n\n LinkerD\n Buoyant\n Dive\n Kubernetes\n Docker\n Finagle\n HAProxy\n NGINX\n CNCF\n Prometheus\n Cisco Webex\n Istio\n","content_html":"

Eric Anderson (@ericmander) chats with William Morgan (@wm), CEO of Buoyant and a creator of the open source service mesh, LinkerD. As a former infrastructure engineer at Twitter, William leveraged his experience there to help develop what would become effectively the first service mesh. Listen to today’s episode to find out how the team at Buoyant originally coined the term, and are continuing to define the concept today.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"","date_published":"2020-07-15T02:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/3015ccfb-f115-49b3-9364-10e3478b5336.mp3","mime_type":"audio/mpeg","size_in_bytes":30789425,"duration_in_seconds":1920}]},{"id":"67f3728d-0720-476c-b44a-eb9c0c64c0c7","title":"Chef with Adam Jacob","url":"https://www.contributor.fyi/chef","content_text":"Eric Anderson (@ericmander) welcomes Chef co-founder Adam Jacob (@adamhjk) to talk about the popular open source service. He and co-founder Nathan Haneysmith originally started the company as a way to sell automation services to startups, but wanted to expand their abilities to serve more clients. From naming the company to governance and engaging with contributors, Adam dives into why it was important to him to go the open source route and how the business model works.\n\nIn this episode we discuss:\n\n\n How Chef got started\n The decision to be open source\n What the business model looks like\n Contributors and community members\n Where Chef is today and where it’s headed\n\n\nLinks\n\n\n Chef\n Puppet\n The Apache Software Foundation\n Docker\n Perl\n","content_html":"

Eric Anderson (@ericmander) welcomes Chef co-founder Adam Jacob (@adamhjk) to talk about the popular open source service. He and co-founder Nathan Haneysmith originally started the company as a way to sell automation services to startups, but wanted to expand their abilities to serve more clients. From naming the company to governance and engaging with contributors, Adam dives into why it was important to him to go the open source route and how the business model works.

\n\n

In this episode we discuss:

\n\n\n\n

Links

\n\n","summary":"Eric Anderson welcomes Chef co-founder Adam Jacob to talk about the popular open source service.","date_published":"2020-07-01T04:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/67f3728d-0720-476c-b44a-eb9c0c64c0c7.mp3","mime_type":"audio/mpeg","size_in_bytes":39179120,"duration_in_seconds":2444}]},{"id":"98a082d7-1a3d-400c-aea1-fcfcb574e38d","title":"Istio with Sven Mawson","url":"https://www.contributor.fyi/istio","content_text":"Eric Anderson (@ericmander) and Sven Mawson (@smawson) dive into the past, present and future of Istio, an open source service mesh born of collaboration between IBM and Google. Sven is a Senior Staff Engineer at Google and co-founder of the Istio project. In today’s episode, he shares the story of how two titans came together for a tool that anyone can use and contribute to.\n\nIn this episode we discuss:\n\n\n How Google asked IBM to drop their Amalgam8 project\n The involvement of Lyft, Envoy and Matt Klein (@mattklein123)\n Making moves at QCon\n A counter-intuitive marketing strategy\n What work still needs to be done\n\n\nLinks\n\n\n Istio\n Google Cloud Endpoints\n Kubernetes\n Envoy\n QCon\n NGinX\n\n","content_html":"

Eric Anderson (@ericmander) and Sven Mawson (@smawson) dive into the past, present and future of Istio, an open source service mesh born of collaboration between IBM and Google. Sven is a Senior Staff Engineer at Google and co-founder of the Istio project. In today’s episode, he shares the story of how two titans came together for a tool that anyone can use and contribute to.

\n\n

In this episode we discuss:

\n\n\n\n

Links

\n\n","summary":"Eric Anderson and Sven Mawson dive into the past, present and future of Istio, an open source service mesh born of collaboration between IBM and Google.","date_published":"2020-07-01T03:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/98a082d7-1a3d-400c-aea1-fcfcb574e38d.mp3","mime_type":"audio/mpeg","size_in_bytes":30208462,"duration_in_seconds":1883}]},{"id":"b73b459d-0777-48f7-1e98-ef6fef5d3588","title":"Envoy Proxy with Matt Klein","url":"https://www.contributor.fyi/envoyproxy","content_text":"Eric Anderson (@ericmander) and Matt Klein (@mattklein123) discuss the beginnings of Envoy Proxy, an open source proxy now governed by the CNCF. Matt is a software engineer at Lyft and creator of the Envoy. On today’s episode, Matt gives the inside scoop on the benefits and challenges of cultivating a self-sustaining open source community.  \n\nIn this episode we discuss:\n\n\n How Matt’s experience at Twitter informed development of Envoy\n Working with Google\n The role of marketing in Envoy’s success\n Why building an open source community is like “total controlled anarchy”\n Finding the right contributors and maintainers\n\n\nLinks:\n\n\n Envoy Proxy\n Finagle\n Hystrix\n NginX\n HA Proxy\n Istio\n CNCF\n","content_html":"

Eric Anderson (@ericmander) and Matt Klein (@mattklein123) discuss the beginnings of Envoy Proxy, an open source proxy now governed by the CNCF. Matt is a software engineer at Lyft and creator of the Envoy. On today’s episode, Matt gives the inside scoop on the benefits and challenges of cultivating a self-sustaining open source community.  

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"Eric Anderson and Matt Klein discuss the beginnings of Envoy Proxy, an open source proxy now governed by the CNCF. ","date_published":"2019-11-14T06:00:00.000-08:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/70b3c74e-b267-493a-ae2e-67049fddc9fb.mp3","mime_type":"audio/mpeg","size_in_bytes":37638940,"duration_in_seconds":2348}]},{"id":"62f04927-17eb-e7fa-4477-ceedad611930","title":"Alluxio with Haoyuan Li","url":"https://www.contributor.fyi/alluxio","content_text":"Eric Anderson (@ericmander) hosts Haoyuan Li (@haoyuan), also known as H.Y., creator of Spark Streaming as well as the open source data orchestration system, Alluxio. H.Y. founded Alluxio, Inc. to further develop the research project that he first created as a doctoral student at UC Berkeley’s AMPLab. Listen to today’s episode to learn more about how H.Y. identified an opportunity to disrupt cloud storage with an open source project as his Ph.D. thesis.\n\nIn this episode we discuss:\n\n\nH.Y.’s analysis of the data storage industry’s cyclical history\nHow H.Y. balanced academics with the Alluxio community\nThe 3 types of Alluxio contributors\nUse cases for Alluxio\n\n\nLinks:\n\n\nAlluxio\nSpark Streaming\nKubernetes\nPresto\nTensorFlow\nAMPLab\n","content_html":"

Eric Anderson (@ericmander) hosts Haoyuan Li (@haoyuan), also known as H.Y., creator of Spark Streaming as well as the open source data orchestration system, Alluxio. H.Y. founded Alluxio, Inc. to further develop the research project that he first created as a doctoral student at UC Berkeley’s AMPLab. Listen to today’s episode to learn more about how H.Y. identified an opportunity to disrupt cloud storage with an open source project as his Ph.D. thesis.

\n\n

In this episode we discuss:

\n\n\n\n

Links:

\n\n","summary":"Eric Anderson hosts Haoyuan Li, also known as H.Y., creator of Spark Streaming as well as the open source data orchestration system, Alluxio.","date_published":"2019-11-01T07:00:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/6f4df71f-101a-498d-90de-cd06430b76d3.mp3","mime_type":"audio/mpeg","size_in_bytes":31365791,"duration_in_seconds":1956}]},{"id":"6d116e35-3fc4-48d5-8136-c5639e873b10","title":"Contributor Trailer","url":"https://www.contributor.fyi/trailer","content_text":"Learn about Contributor, a podcast about the best open source projects and the communities that build them.","content_html":"

Learn about Contributor, a podcast about the best open source projects and the communities that build them.

","summary":"","date_published":"2019-07-04T15:30:00.000-07:00","attachments":[{"url":"https://aphid.fireside.fm/d/1437767933/657ccb75-c55f-4363-8892-f45dd46caf80/6d116e35-3fc4-48d5-8136-c5639e873b10.mp3","mime_type":"audio/mpeg","size_in_bytes":796256,"duration_in_seconds":45}]}]}