Jay Taylor's notes

back to listing index

Streaming data in Go, without bytes.Buffer – Stupid Gopher Tricks – Medium

[web search]
Original source (medium.com)
Tags: golang go allocation pipes buffering medium.com
Clipped on: 2017-04-30

Image (Asset 1/12) alt=var buf bytes.Buffer
// Write JSON-encoded data to a buffer of bytes.
// This buffer will grow to whatever size necessary to
// hold the buffered data.
err := json.NewEncoder(&buf).Encode(&v)
// Send the HTTP request. Whatever is read from the Reader
// will be sent in the request body.
// The buffer is also a Reader, so it can be passed in here.
// Its data will be read out until the buffer is complete.
resp, err := http.Post(“example.com”, “application/json”, &buf)

A bytes.Buffer satisfies both the Writer and Reader interfaces, which makes sense, because a buffer can be both written to and read from.

This works just fine in most cases, when you have small enough data that the memory overhead isn’t that onerous, a couple hundred bytes or so, up to a few KB.

That bytes.Buffer starts off small (64 bytes), and grows as needed as data is written to it, possibly allocating a whole new slice and copying your data into it, then discarding the original byte slice to be garbage collected.

You could specify the size of the buffer with bytes.NewBuffer, but even then you won’t know exactly how many bytes you’ll end up writing, and you may overallocate or underallocate.

Let’s skip buffering altogether.

Encoding with a Pipe

If the data you’re encoding is huge, or you just want to avoid unnecessary allocations, Go has you covered. In the io package, there is a pair of interfaces called PipeReader and PipeWriter. These are created with a call to io.Pipe, and the Reader and Writer it returns are paired together, such that whatever is written to the PipeWriter is read from the PipeReader.

It’s like a portal from Portal, whatever goes in the Writer end, immediately goes out the Reader end. Here’s a new version of the code to demonstrate:

// Set up the pipe to write data directly into the Reader.
pr, pw := io.Pipe()
// Write JSON-encoded data to the Writer end of the pipe.
err := json.NewEncoder(pw).Encode(&v)
// Send the HTTP request. Whatever is read from the Reader
// will be sent in the request body.
// As data is written to the Writer, it will be available
// to read from the Reader.
resp, err := http.Post(“example.com”, “application/json”, pr)

Note that I haven’t used a bytes.Buffer here (or a []byte, or string, etc.) — this code does zero unnecessary buffering. It may do some small amount of buffering internally as data is written faster than it’s read, but it’s negligible. As data is written to the Writer, it can immediately be read from the paired Reader.

But wait! This code also includes a serious bug! Because we’re writing to the Writer, then reading from it afterwards, and because the Writer explicitly will not buffer anything, we have a deadlock on our hands! Calls to Write will block until that data can be read, which we’re not doing until after we’ve written. That’s a deadlock.💩

To solve this problem, we can use a goroutine to concurrently write while data is being read.

// Set up the pipe to write data directly into the Reader.
pr, pw := io.Pipe()
// Write JSON-encoded data to the Writer end of the pipe.
// Write in a separate concurrent goroutine, and remember
// to Close the PipeWriter, to signal to the paired PipeReader
// that we’re done writing.
go func() {
err := json.NewEncoder(pw).Encode(&v)
pw.Close()
}()
// Send the HTTP request. Whatever is read from the Reader
// will be sent in the request body.
// As data is written to the Writer, it will be available
// to read from the Reader.
resp, err := http.Post(“example.com”, “application/json”, pr)

In this version, we spawn a goroutine to write to the pipe. This goroutine will not block execution of the next line of code, which passes the Reader to http.Post to be read.

The writing goroutine will write one chunk of data as before, then block until that data can be read from the Reader. When code backing the POST request reads the request body from the Reader, the writing goroutine will become unblocked and be able to write another chunk. And so on, until the writer is done and pw.Close is called. Closing the Writer signals to its paired Reader that there’s no more data, which in turn returns io.EOF, signalling to http.Post that the request body is complete.

And that’s that. We have JSON-encoded a value into an HTTP request, without unnecessary buffering, using a pipe.

Summary

This has demonstrated how you can use io.Pipe to avoid unnecessary buffering and memory allocations.

In most cases, it probably won’t be necessary; buffering is just fine for small enough data. But if you find that you’re using more memory than you’d like, or if you know you’ll be working with really huge data, a pipe can be a useful weapon in your arsenal.

Image (Asset 2/12) alt= pw.CloseWithError(json.NewEncoder(pw).Encode(&v))
}()

This will pass any error from encoding to pw, which will tell the paired PipeReader that the…

Conversation with Jason.
Image (Asset 3/12) alt=