Robust Observability: OpenTelemetry with Austin Parker

00:00:00
/
00:35:01
Your Host

About this Episode

OpenTelemetry is an open-source observability framework for collecting and managing telemetry data. OpenTelemetry has been more successful than expected, becoming the second fastest growing project in the CNCF. It allows for flexibility and avoids vendor lock-in, making it attractive to startups and large enterprises alike. On today’s show, Eric (@ericmander) sits down with Austin Parker (@austinlparker), director of open-source at Honeycomb.

Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com.

Subscribe to Contributor on Substack for email notifications!

In this episode we discuss:

  • How Austin’s interest in complex systems led him to the observability field and developer relations

  • An X argument that contributed to the merger of OpenTelemetry and OpenCensus

  • Why foundations help maintainers to strike a balance with their contributors

  • Austin’s opinion on the secret to OpenTelemetry’s success

Links:

People mentioned:

Austin Parker:
There's only so much you can do as a maintainer. There's only so much you can do as a founding member of a project versus what all the people that are going to come in with their own ideas and their own projects and their own success criteria too. How do you protect your time as a maintainer and how do you protect the project roadmap? There's all these questions that there's no manual for.

Eric Anderson:
This is Contributor, a podcast telling the stories behind the best open source projects and the communities that make them. I'm Eric Anderson. Today, we're here with Austin Parker, who is director of open source at Honeycomb. Honeycomb's in the scale portfolio where I work, and so I get to work with Austin and his team quite a bit, but this is the first time Austin and I are talking.

Austin Parker:
Yes, that's true.

Eric Anderson:
Thanks for coming on the show. Good to meet you.

Austin Parker:
Thanks to having me. It's great to be here. I recently joined Honeycomb a couple months ago, but I've been an external admirer of the team over there for quite a while and I've known Charity and Christine and all of them, and they're all fantastic humans, which is one of the biggest motivating factors in moving to join them.
I've been doing observability stuff for five, six years now, and when it was time for me to figure out the next step, I wanted to be around people that I felt I shared a lot of values with, but also that we're really interested in being part of this bigger open source story in the observability world. And that's something that Honeycomb was really committed to. So I was very happy to come on board and see that take into its next level and really redouble our commitment to open source and OpenTelemetry.

Eric Anderson:
And as an investor, we are very happy to have you on board, Austin. OpenTelemetry, I think is underappreciated as a technology as a community, and Honeycomb I think is embracing or making OpenTelemetry a more important part of the story with time.
And so you're kind of in the middle of some awesome stuff. How did you get there and how did you get into, I don't know if everyone knows the path into developer relations even? Was that where you planned on going?

Austin Parker:
It's quite a twisted path. In a lot of ways, my journey into developer relations is my journey into software in general, because like a lot of people, back when I was 18, 19, in the early 2000s, there was definitely this like, ah, you can just go, don't need to go to college. You can just get a job in IT, and you can work with computers very... Without having to kind of have done the whole computer science thing.
That was true for the most part. I was able to get into it and IT was really interesting stuff to me just thinking about how these really complex systems worked and the actual computers that made them up. One of my earliest jobs when I was 15 years old was as the webmaster for my hometown newspaper, and the web server was a Mac Quadra just sitting under a desk in the second floor of the newsroom.
What I would do every day after school is I would come in, I would fire up a copy of BBEdit, I would take the stories that have been put into QuarkXPress and copy the text, put it into HTML files and then just move them over to the server. Just open a shared folder and just drop them in, and there you go. The new paper was done, and it was really, really wild to think about I am doing this...
I am giving this information to people, and how do they all get into that machine? How do they all get into that server sitting under there? And this was very early days of the web, mid-90s, and over time that really stuck with me is like, how does this stuff all work? And it drove this interest of systems thinking that inspired me to keep diving into it.
Then later in life, I decided I wanted to go back to school, get the computer science stuff done and move into software ,because IT is great, tons of respect to people that do that as a career, but I wanted a little bit more. And so I went, did all that, got computer science, informatics, the whole nine yards.
And coming into college as a 26-year-old is very different than trying to do it at 18, the way you see things and the way you deal with all those systems is very different. But because of that, I was able to get into a little software startup that at the time was, it's a company that is no longer with us called Aprenda, but they very early, very early on embraced Kubernetes, and this is back zero point whatever for Kubernetes, super early days.
The whole idea of cloud native was being defined around us. As I got into working with these very large complex on-demand platforms, these cloud platforms, it's like you can see those systems again and the complexity of those systems and how to actually understand them. And I got really tired, honestly, of sitting there trying to figure out what was breaking by Grepping through a bunch of logs and looking at all this disconnected telemetry data that we had.
That led me into a company called Lightstep. And Lightstep was a supporter of a project called OpenTracing, and one thing led to another and that's how we got to OpenTelemetry. But while I was at Lightstep, that's where I shifted from engineering into developer relations. And a lot of that was necessity, but a lot of it was also just interest. I've had a lot of time to work and explore options outside of tech and outside of IT.
I've been an adjunct professor at a college before. I've done community theater, 40 billion other things. The idea of teaching people about this technology and helping them understand it and being able to be a communicator was really interesting to me. So, the idea of like, oh, there's a job where you can do that, cool, let me go do that, came very naturally.

Eric Anderson:
And sometimes that's the role you have to adopt when you are the creator or early maintainer of an open source project. There's a certain level of evangelism that is perhaps thrust on you. The community kind of is like, how does this work? What do I do? And suddenly you're maintaining a community.

Austin Parker:
I mean, nobody thinks about the website. If you're out there building the next great open source, whatever, it's really easy to get caught up on like, oh, I just need the tech, because there's a lot of that. There's a lot you have to go through, and it's not a quick process and it's not a clean process to actually build the technology.
But as you get users and as this community grows around you, and if you have a good idea, you will grow a community. The difference between I think between successful projects and unsuccessful ones is how well they nurture that community. Those are the people that are going to take it and run with it.
There's only so much you can do as a maintainer. There's only so much you can do as a founding member of a project versus what all the people that are going to come in with their own ideas and their own projects and their own success criteria too.
So how do you both nurture that and how do you make sure that people that are coming in with their own ideas don't feel like they're just running into brick walls over and over trying to do something different than you want to? How do you protect your time as a maintainer and how do you protect the project roadmap?
There's all these questions that there's no manual for. There really isn't even a formal mentorship program for. We don't really think about, we don't teach our open source maintainers this balancing act. So either you're fortunate enough to have that sort of evangelism bone in your body or you find those people and you let them run with it or you don't.
I do think this is one of the advantages of foundations though, like the CNCF, because they can help provide some of that community muscle and some of that organizing and they have people to do that part for you if you're not good at it.

Eric Anderson:
So in your case, the projects at hand here weren't necessarily OpenTelemetry at first, but OpenTracing, OpenCensus, although I think you were on the tracing side. The merger of these two projects is unique to me. Maybe this is common in opensource land and we don't see it, but how did that come together? And I guess we probably need to address where OpenTracing came from.

Austin Parker:
I will point out generally, actually, this doesn't happen. There's a lot. The odds were against us. So for context, OpenTracing was a CNCF project that was run by a coalition of, well, at the time was this very nascent idea of distributed tracing as a observability concept. People were familiar with logs, people were familiar with metrics, but distributed tracing didn't have quite the cachet that it has today.
I think almost 10 years ago now, when this was coming up, it had been used at Google, it had been used at your Metas and your very large scale enterprises, software-based enterprises, for quite a while. It solves this really important problem of how do I understand a request path through a distributed system. And people have various forms of this using correlation, log correlation and whatnot, but distributed tracing is, "Hey, let's apply a model to it. Let's build an ecosystem around it."
So OpenTracing was supported by Lightstep. They are now part of ServiceNow, but at the time was an independent company. Some engineers from Uber who worked on a project that still exists called Yeager, which is a trace visualizer, some Zipkin maintainers. Twitter pre acquisition was a huge user of distributed tracing. A lot of the people internally there built out a lot of the stuff that we still use today conceptually.
So there were all these people and they were working on defining a open standard for just a tracing API with the idea of, hey, if we have a standardized API for this, then everyone can rely on that. And then different observability vendors can implement that API for their particular tool. Concomitantly with some of this, teams at Google and Microsoft were working on something called OpenCensus, which had very similar aims but slightly different implementations details.
So OpenCensus included not just an API but a SDK to actually let you use all this. It had more affordances for a larger ecosystem of tools around it. Now, in the public eye, these were very similar projects. They both accomplished mostly the same thing.
That was kind of silly. We would have open source developers or library maintainers come up and be like, "Well, I have users that want to add... They want tracing in my library, but I don't know what I should do. I don't know, should I use OpenTracing? Should I use OpenCensus? Because you two are not compatible with each other and there doesn't seem to be a real alignment in the community around which of these things is going to win."
And what we saw happening was people would basically say, "I don't want to make a decision. I'm just going to wait and one of these things is going to go away and the other is going to succeed," and that makes my decision for me. And there was a time, and this was around the end of 2018, October-ish, I want to say, where I got into a Twitter argument with some people about this specific topic of OpenTracing versus OpenCensus.
Some of us saying, "This is silly, why are we arguing about this?" Because we all want the same thing. I took that back to the other OpenTracing maintainers and to other kind of people in the community and we reached out to the CNCF and said, "Hey, can we get someone to... Let's figure this out. Let's get a mediator in here. Let's get a small group together to talk about this, figure out the feasibility, and if it is feasible, let's merge these things." Because it is wild that there's these two projects that both are basically doing the same thing and it's harming the overall observability community by there not being a single answer.

Eric Anderson:
Let me get this straight, Austin, the creators probably had some vested interests in seeing their project continue and maybe be the winner. And so maybe both projects were inclined to not fight it out, but continue in hopes that they could end up on top.

Austin Parker:
It's very easy to overlook structural versus individual reward semantics when we talk about open source. So at the time, the good thing is that I would say neither of these projects were actually stunningly successful. They were both standing on their own two legs.
But in the case of OpenTracing, there were decisions that had been made early on in the project that weren't really panning out. There were things that we knew. It's like if we could have done it all over again, what will we do differently? That's the point we were at. And I think on this OpenCensus side, they were seeing this similar, we're spending all this effort trying to go against this other thing that's mostly duplicative of what we're doing and we're wasting time on this.
So there were a lot of systemic barriers I would say, to the idea of like, oh, we have all this invested already, why change? But I don't want to dismiss. In this specific case, it was really gown do like, "Hey, the right people were in the room."

Eric Anderson:
This idea of power brokers coming together and it's just awesome.

Austin Parker:
I wouldn't even say power brokers. I mean the way I joked about it, there's maybe 50 to a hundred people in the world that really, really care about this stuff. And it just so happens that enough of them were part of these small groups that we were able to make progress.
But in a lot of cases, it was like the in question, were able to see past the systemic incentives of the competition and do what was right for the community writ large, do what was right for the observability world, writ large. And I think looking back, because in 2023, we hit five years since the initial conversations, and I've spent a lot of time thinking back about it.
It's been more successful than I could have ever imagined initially when this originally happened. The idea that we've gone from these two fairly niche things, duking it out metaphorically with each other to becoming the second-fastest growing project in the CNCF. We're almost as big as Kubernetes when you look in terms of contributions and pull request activities and commits and all this other stuff.
That's massive over a very short amount of time. And I credit that to yes, those initial group of people, most of which those core contributors are still here with us and they're still involved in the project, but also to this really amazing community that we've been fortunate enough to build and how they have come along with us on this journey.

Eric Anderson:
So let's now talk about what has OpenTelemetry now become, and it plays a unique role actually in the ecosystem, an increasingly important one seems.

Austin Parker:
I would agree. The idea behind OpenTelemetry is pretty fun, is actually fairly straightforward to explain. It is the idea that if you're running any kind of software system, any kind of computer system, cloud native, whatever, you need to understand what's going on in that system in order to fix bugs, in order to understand how it's performing in production, in order to make improvements, do whatever.
You need data about it, you need telemetry data. And in order to really, as our systems have gotten more complex, the needs of that telemetry, that what we want that telemetry to do has also gotten more complex. And to really support modern and next generation cloud native workloads, we need a new way of thinking about that telemetry data. We need standards for it. We need standard ways to not only talk about it using our words, but standard ways to communicate it from cloud provider to cloud provider or from software to observability system.
We need common nouns and verbs. We need common metadata on our telemetry so that any given HTTP server or any given cloud platform is going to speak the same language in terms of what is a host name, what is an IP address? And that is what OpenTelemetry does. It is effectively an open standards project for creating telemetry data for cloud native systems.
The goal of OpenTelemetry is to make that telemetry a built-in feature of cloud native systems. So our vision of the future is that you'll install your express HTTP server or your React framework or whatever it is you're using, and you'll just get this really rich stream of telemetry data, metrics, logs, traces, whatever, that you can then transform and send to any backend, any front end that you want to visualize and understand that data to turn that telemetry into observability.

Eric Anderson:
Now, I think about this as an investor in terms of business terms, for the longest time, this is a big market observability or metrics or monitoring, Splunk, Datadog, New Relic are big companies. If I understand historically, part of the technology and business strategy has been that you publish your own SDKs and other collecting code that gets embedded in people's applications.
So switching off of New Relic or something requires changing this code, and OpenTelemetry maybe provides a world in which you can add OpenTelemetry instrument in your application and then use any kind of backend service.

Austin Parker:
Yeah, that's certainly the objective. And in a lot of cases, that's where we're at today. If you're running a Java or .Net application, there are drop-in agents and libraries that you can use to get your critical application telemetry data out of there and send it to over 40 or 50 different observability backends including Honeycomb.
What I think is really cool about this and something that you're starting to see more of is this is really increasing the amount of innovation in the observability sphere because traditionally if you wanted to make an observability tool, then you had to overcome that hurdle of how do I get the data? How do I get the data?
That's one of the reasons that your Datadogs and Splunks and New Relics and all of these have been so effective at keeping their marketing growing is that they have all these integrations. But OpenTelemetry says, "Well, what if that's no longer a, 'You must be this tall to ride kind of barrier anymore'?"
So now we're starting to see very much shoots and leaves. We're not seeing a ton of stuff yet, but we're seeing a lot of new entrances into the market that are exploring really radically new ways of thinking about these problems. A lot of them actually end up looking like Honeycomb. The idea is that we've been building for years around how you should think about observability and what your tools should look like.
We're starting to see echoes of that in newer entrants, which is really interesting. I think it's a really great validation, honestly, about our strategy and about how we approach this problem, and I think it provides a very disruptive moment for existing players in this, because historically they've worked on this older model, where you install their agent.
You're locked into their ecosystem and that's how they bill you and that's how all this stuff ends up working out, what happens when everyone is on open to telemetry and you can't just have people install your agent anymore? It's an interesting question.

Eric Anderson:
So I think historically people would choose their backend service and then they would apply the necessary agents and things. I'm curious about the behavior of large enterprises, are they now just adding OpenTelemetry and then deciding later which kind of monitoring services they work with?

Austin Parker:
That's a good question. I've had two interesting conversations about this pretty recently. One is from a very smart, very small startup, they're launching a SaaS application. And they were trying to figure out what do we do for observability?
The conclusion they came to was, we are not using any of these existing players because of expense and cost and we don't want to be locked into that. What they decided was if we build around OpenTelemetry, then we preserve our optionality going forward. And for now, we can use what we get for free. Not for free, but what we get with our cloud provider.
Because what you're seeing, our cloud providers are also standardizing on this, like Azure, Google Cloud and AM AWS are all adopting OpenTelemetry. So if you're building on those clouds with OpenTelemetry, your stuff is now compatible with their stuff. And as you grow, you have the option to be like, I need to graduate from this into something bigger, something better, and I can do that. That's what this small team, that was their conclusion right after looking at pretty exhaustively at both current large, mid and smaller entrance into the observability sphere.
The other conversation I had was with the head of platform engineering at a very, very large financial services company. And the way they're thinking about OpenTelemetry is really the same way that they were thinking about Kubernetes several years ago, where they know what they have today and they know this massive, massive cost of monitoring and understanding their existing systems. But they also know that they are still going through a cloud transformation. They'll be doing it for a while.
And for that cloud transformation, for that new cloud platform they're building, Kubernetes is the center of it. They might not be running pure Kubernetes everywhere. They might be running some mixture of OpenShift and Elastic Kubernetes and Google Cloud and dah, dah, dah, dah. There's a lot of ways you can run Kubernetes, but the Kubernetes API, like the idea of Kubernetes, sits the center of this platform.
In much the same way that Kubernetes is the center of the orchestration and the way you're running applications, OpenTelemetry is the way you're understanding those applications, monitoring and monitoring those clusters, all of the different things, plug into OpenTelemetry is the center of the universe. And then you have, again, options.
A lot of them are saying like, look, it's not, it's cost effective for us to roll our own observability stack. We have economies of scale that you don't, or we want to do a hybrid approach, where we have our data lake, our data warehouse over here, and then we have some stuff that goes into specific tools because it's important.
A really interesting example of this is Slack. I think Intuit also does some stuff like this, but they build through this system where they have all of their observability data in manage data sources that they manage, but then they can tail certain parts off of it to other platforms, other observability tools based on how important it is. OpenTelemetry is the center of that too, because not just creating the data, it's collecting it, doing data pipelines, so on and so forth.

Eric Anderson:
So how does OpenTelemetry work? If we go back to the old model where my vendor provided me a bunch of agents and SDKs, is there a risk that OpenTelemetry doesn't address all the ways of collecting?

Austin Parker:
So at a really fundamental level, OpenTelemetry provides this full tool chain for creating, exporting, collecting, and transforming telemetry data. So that includes an API that can be bundled as part of a library or a framework, an SDK that lets you create and export that telemetry, and a tool called the Collector, that's a Swiss army knife. It lets you get in telemetry from OpenTelemetry SDKs.
It also lets you scrape logs that are in files or listen for stats D or Prometheus metrics, and then it transforms them and sends them somewhere else. So within that, you have this huge range of options. It's designed to work with what you have today. If you're using some combination of stats C metrics and logs, cool. You can drop the Open Symmetry collector in, set up some rules and now send that out to wherever.
If you are using a fully proprietary stack, there are some agents in some ways to receive that data and transform it to OpenTelemetry, and it's all open source so you can go in and write your own too. We've seen a lot of that people coming in and saying like, "Oh, well I have data in this format," and if there isn't already a receiver, they'll come in and contribute that back so that everyone can make use of it.
This is one of those things, where it's a rising tide that lifts all boats. Everyone benefits in the ecosystem from having more ways to get data in and translate it, because it makes that data more useful and it saves you as a developer, if you're a developer or you're an SRE, you don't have to think about, we can't change observability tools. Ours is too expensive. We're stuck with this forever.
It's like, well, OpenTelemetry makes that migration really painless because you can just use the collector, take the stuff you already have and shove it to the new place you want it to go. Also, really good for people that are, what is much more common, especially places that have been around a little longer, is you've got all these legacy services.
Some of them are going to be emitting data in all sorts of different formats. You can go and you can write that yourself and plug it in really easily to translate that into a modern format and align it with your other new work so that you're getting the old stuff and the new stuff hand in hand and this braid of telemetry data rather than having a bunch of disconnected tools.
It's one of the things that you see a lot, I think, in the monitoring world now is developers that are trying to... You're on call. Something breaks. It's like, I got to go to this over here to see the alert, and that gives me a little bit of context, and now I got to go over here and try to find that in two or three or four or five other places. And maybe I don't even have access to the place that emitted from that the original problem started at.
So now, I need to page someone else or get them on Slack or Teams or whatever and have them go look in their two or three or four or five systems for the data. With OpenTelemetry, we're giving you all this as an interconnected interrelated stream, so your traces, your metrics, your logs, they're all connected to each other. They're all correlated with each other through the observability context that we provide.
If you put that into tools that can really make use of it, tools like Honeycomb, then it's a very transformative way of thinking about what does observability mean to me? How am I actually using this to debug incidents? How am I using this to do performance profiling?
It's an interesting part because it's like having the telemetry is great, but telemetry does not equal observability by itself, and just collecting a bunch of data doesn't give you any value. I think that's cool about Honeycomb, why I like being there is that I think we really do have a great way of looking at that data and helping you get value out of it.
But I also think that it really helps push the industry forward and helps say, "Hey, it's not enough just to have everyone locked in and your agent or whatever to get the data. You actually have to do something with the data, and that's what's going to advance this field." More so than just focusing on collection or storage.

Eric Anderson:
You describe a narrative, which is interesting, that you could have this startup, for example, whose motivation is more like avoiding lock-in and maybe managing costs, adopting OpenTelemetry, instrumenting everywhere, and then if they choose something like Honeycomb, they end up in this world where actually we're doing observability better than we've ever done it before.
I went into this journey with avoiding lock-in and lowering costs, and I ended up doing super advanced observability and getting a bunch of visibility in my systems, which is saving me time and money in different ways.

Austin Parker:
Yeah, it's about prioritization. When you're at that new startup level with 10 people, you want to optimize for different things than when you get to 50 or a hundred or 200. That's the whole startup journey is just figuring out the right time to make different trade-offs, and I think OpenTelemetry gives you that flexibility regardless of your organization size, because even very large companies are going to go through those trade-offs.
We've seen this over the past couple years with the changes in macroeconomic conditions. A lot of people are doing a lot of belt tightening, and they're looking at every contract and they're looking at every penny of observability and monitoring spend and asking, "What are we getting out of this?"
I want people to be able to not have to look at it and say, "Well, if we lose this spend over here, then we literally don't know what's going on anymore because reliant on some proprietary agent." I think that's a bad corner to get backed into, and move to a point where they can say, "Oh, well, we can be intentional about our choices in how we spend on observability and monitoring. We can be intentional about our investments in this.
"Can we do more custom telemetry? Can we do more in terms of sampling? Can we do more in terms of making really reactive responsive systems?" But you have to have that telemetry data and it has to be open and it has to be vendor-agnostic, and that is when you get down to it. That's why OpenTelemetry is inevitable. At the heart of all of this is OpenTelemetry is solving a really big problem for a lot of people, which is why it's been as successful as it's been.

Eric Anderson:
So this successful project you mentioned, it has as many, I don't know if it was contributions or whatever metric it was relative to Kubernetes. Where do OTel people hang out? So it is part of the CNCF, which some people highly associate with Kubernetes, and so if you go to KubeCon, there's like an OTel Day.

Austin Parker:
So we have a pretty broad community. I actually looked the other day and something like over the past five years, like three or 4,000 contribution, like different unique companies contributing, spanning everything from very small to Fortune 50. So not everyone is contributing an equal amount, obviously.
There's a core group of 10, 20, mostly companies that are involved in the space, your Splunks, your Honeycombs, your Lightsteps, that are doing a lot of the work, but everyone tends to come together online. We have weekly meetings for all of our SIGs. We have a lot of stuff gets done on GitHub, like most good open source projects.
We have Slack channels in the Cloud Native Computing Foundation Slack, and then twice a year at KubeCon in Europe and in North America, we try to have a lot of OpenTelemetry content at those. One thing we started doing at the end of last year in KubeCon in Chicago is we have this new thing called the OpenTelemetry Observatory, which is just a bigger sort of project booth. It's sponsored by Splunk, and it's a place to have happy people come together, whiteboard out stuff.
That's a real fun way for the community to see each other in person and get together and talk. We also have Observability Day at KubeCon. Actually, we're going to have another one here in mid-March in Paris, which is going to be actually been so successful. We've had to expand it to two tracks, so that'd be the first time that's happened.
There's just so many people that want to come and share their OpenTelemetry stories and their observability stories in the cloud native world that we had to get, well, it was more seats or more talks, and we went with more talks. So hopefully that'll go well, and we should also be having some specific community days this year as well. Hopefully, over the summer. The best way to find out about all this is to keep an eye on our website OpenTelemetry.io.

Eric Anderson:
Well, I think you're in an envious position of you get paid to work on open source and a really cool project with an awesome team.

Austin Parker:
I'd like to think so.

Eric Anderson:
Thank you for joining us today.

Austin Parker:
Thanks for having me. It was great to come on, a really fun conversation.

Eric Anderson:
You can subscribe to the podcast and check out our community Slack and Newsletter at Contributor.fyi. If you like the show, please leave a rating and review on Apple Podcasts, Spotify, or wherever you get your podcasts. Until next time, I'm Eric Anderson and this has been Contributor.