Adding observability to a monolithic Go app

Introduction

So you haven't fallen for microservices just yet. However, all the cool kidz are showing off their observability and metrics dashboards with detailed traces that show exactly how long each portion of the request took and if it had any errors. But they aren't really doing that in code, they usually have a kubernetes cluster with a service mesh on top and an entire platform team that makes all the magic happen. In this short article I'll go over how to get some of that sweet observability in your existing monolithic application without writing almost any code.

The Plan

In order to accomplish our goals we will need 3 things, interfaces, proxies and interceptors. The rest of the articles also assumes that you have a basic 3 layer architecture with app/service/repo layers but you should be able to apply the same technique with other architecture. It's also worth noting that although I'll be showing how to do this in Go, you can use the same pattern in other languages (apparently Java folks have been using this for a while).

The plan is to bisect each layer with interfaces. Each layer must connect with another through interfaces which either already exist or are generated by a tool like ifacemaker. We then create proxies that take the implementation of those interfaces and a set of interceptors as arguments and intercept all the requests made to the implementations. Then the interceptors will run and they can add logging, metrics and traces to each request.

Show me the code

Now that we know what the plan is, it's time to put it in practice. We'll now go over an example on how to do this in a Go monolithic application, let's get started!

The Setup

Let's go over the existing code that you probably have.

There are probably repository interfaces:

type TaskRepository interface {
	List(ctx context.Context) ([]*models.Task, error)
	Get(ctx context.Context, id int64) (*models.Task, error)
	Create(ctx context.Context, task *models.Task) error
	Update(ctx context.Context, task *models.Task) error
	Delete(ctx context.Context, id int64) error
}

Repository implementations:

type MemoryTaskRepository struct {
	tasks  []*models.Task
	logger *zap.Logger
}

func ProvideMemoryTaskRepository(logger *zap.Logger) repo.TaskRepository {
	l := logger.With(
		zap.String("repository", "task"),
		zap.String("storage", "memory"),
	)
	return &MemoryTaskRepository{
		tasks:  []*models.Task{},
		logger: l,
	}
}

func (r *MemoryTaskRepository) List(ctx context.Context) ([]*models.Task, error) {
	return r.tasks, nil
}

// other functions below...

Some service interfaces:

type TaskService interface {
	List(ctx context.Context) ([]*models.Task, error)
	Get(ctx context.Context, id int64) (*models.Task, error)
	Create(ctx context.Context, task *models.Task) error
	Update(ctx context.Context, task *models.Task) error
	Delete(ctx context.Context, id int64) error
}

Some service implementations (although usually one):

type taskService struct {
	repo   repo.TaskRepository
	logger *zap.Logger
}

func ProvideTaskService(repo repo.TaskRepository, logger *zap.Logger) iservice.TaskService {
	l := logger.With(
		zap.String("service", "task"),
	)
	return &taskService{
		repo:   repo,
		logger: l,
	}
}

func (s *taskService) List(ctx context.Context) ([]*models.Task, error) {
	return s.repo.List(ctx)
}

// other functions...

and the same for app and probable some controllers too - you get the idea.

The Problem and the Solution

The problem we are facing is that there's too many domain services, repositories and app services. How do we add logs and handlers to all of them. Take tracing for example, we need to start and end spans, how do we do that without adding explicit statements at the start and end of every function?

The answer is code generation. We don't have to write almost any of the code, we can generate all of it. First of all we need to generate the interfaces if we don't have them already. You can easily do that by adding a top-level comment with go:generate that will call ifacemaker to generate all the interfaces. Now that we have the interfaces, we will need some way to intercept all of the requests. For this we can use something like proxygen with yet another go:generate comment at the top of each file. Proxygen will generate all the proxies for you that you can replace in the Provider methods of each service.

For the example above the generated proxy would look something like this:

type TaskService struct {
	Implementation importiserviceTaskService0.TaskService
	Interceptors   proxygenInterceptors.InterceptorChain
}

var _ importiserviceTaskService0.TaskService = (*TaskService)(nil)

func (this *TaskService) List(
	arg0 importiserviceTaskService1.Context,
) (
	[]*importiserviceTaskService2.Task,
	error,
) {
	rets := this.Interceptors.Apply(
		[]interface{}{
			arg0,
		},
		"List",
		func(args []interface{}) []interface{} {
			res0,
				res1 := this.Implementation.List(
				args[0].(importiserviceTaskService1.Context),
			)

			return []interface{}{
				res0,
				res1,
			}
		},
	)

	return proxygenCaster.Cast[[]*importiserviceTaskService2.Task](rets[0]),
		proxygenCaster.Cast[error](rets[1])
}

// more functions below...

Then you can change the NewTaskService method to return this instead:

return &proxy.TaskService{
    Implementation: &taskService{
        repo:   repo,
        logger: l,
    },
    Interceptors: interceptor.InterceptorChain{},
}

Now you may be thinking that you'll have to add all these comments and update the providers for every service, but that's still quite a lot of work. Worry not cause I'm just as lazy as you are so I have the solution for that too! When I had to do this, I wrote a quick js script to process all the files parse the names of the services with some regex and then add the comments and update the providers - with enough regex, you can do anything! You can also do some of that with vim macros if you're into that.

The Interceptors

Ok, we've finally reached the fun part. Here you are free to do whatever you want, add tracing spans, logging, metrics whatever you want. For this demo I've added a simple logging interceptor which will allow us to find what caused a nil pointer exception in our application. The interceptor looks like this:

func TracingInterceptor(
	logger *zap.Logger,
	structName string,
) interceptor.Interceptor {
	return func(method string, next interceptor.Handler) interceptor.Handler {
		logger := logger.With(
			zap.String("method", method),
			zap.String("struct", structName),
		)

		return func(args []interface{}) []interface{} {
			var ctx context.Context
			ctxIdx := -1
			for idx, arg := range args {
				if _, ok := arg.(context.Context); ok {
					ctx = arg.(context.Context)
					ctxIdx = idx
					break
				}
			}
			if ctx != nil {
				if userID, ok := ctx.Value("UserID").(string); ok {
					logger = logger.With(
						zap.String("UserID", userID),
					)
				} else {
					logger.Info("no user id")
				}

				if requestID, ok := ctx.Value("RequestID").(string); ok {
					logger = logger.With(
						zap.String("RequestID", requestID),
					)
				} else {
					logger.Info("no request id")
				}

				args[ctxIdx] = util.AddTraceToContext(
					ctx,
					fmt.Sprintf("%s.%s", structName, method),
				)
			}

			logger.Info("calling method")

			return next(args)
		}
	}
}

// the AddTraceToContext method in the util package

func AddTraceToContext(ctx context.Context, trace string) context.Context {
	c, ok := ctx.(*gin.Context)
	if !ok {
		return ctx
	}

	stack, ok := c.Value(TraceStackKey).([]string)
	if !ok {
		stack = []string{}
	}
	stack = append(stack, trace)

	c.Set(TraceStackKey, stack)

	return c
}

Note that I'm also using gin and so I'm attaching the stack trace to the gin context but you could have a traceID on the request and store traces in a global store instead. The options are endless.

Quick Demo

You can find all the code for this demo here. I have intentionally created a nil pointer exception in the repository code. If we had some recover from panic we would end up losing our entire stack trace and it'd be pretty difficult to debug where the error happened. By using the tracing interceptor from earlier in a middleware we can do something like this:

e.Use(gin.CustomRecovery(func(c *gin.Context, err any) {
    traceStack := util.GetTraceStack(c)

    logger.With(
        zap.String("RequestID", c.GetString("RequestID")),
        zap.String("UserID", c.GetString("UserID")),
        zap.Strings("TraceStack", traceStack),
        zap.Any("Error", err),
    ).Error("Oh no! Anyway...")
}))

The result is that when we run the server and send a request to /tasks/1 to get the first task from an empty slice we see the following error in the logs:

2023-09-11T21:02:44.283+0100    ERROR   cmd/main.go:33  Oh no! Anyway...        {"RequestID": "f030dc2c-30d9-441e-a3fe-834d16bea81c", "UserID": "panagiotis", "TraceStack": ["TaskService.Get", "TaskRepository.Get"], "Error": "runtime error: index out of range [1] with length 0"}

It's not very readable now but usually you would parse and export these logs so it'd look nicer in your logging solution. The important thing is that it has a TraceStack field which tells us exactly what calls were made and in what order. We can see from that, the last thing we called was TaskRepository.Get so the issue must be somewhere there.

Summary

Although this pattern is pretty common and the example is quite basic, I hope it helped you see the value in generating code for interfaces and proxies. You can use this technique to split your layers and add interceptors for metrics, observability and traces. You barely have to write any code and you get all these features for minimal execution costs (the proxygen library doesn't use reflection so it's pretty fast). Give it a Go (pun intended) and see if works for you!