Jay Taylor's notes

back to listing index

Codebase Refactoring (with help from Go)

[web search]
Original source (talks.golang.org)
Tags: golang go refactoring hygeine talks.golang.org
Clipped on: 2017-04-22

Codebase Refactoring (with help from Go)

Russ Cox

rsc@golang.org

1. Abstract

Go should add the ability to create alternate equivalent names for types, in order to enable gradual code repair during codebase refactoring. This article explains the need for that ability and the implications of not having it for today’s large Go codebases. This article also examines some potential solutions, including the alias feature proposed during the development of (but not included in) Go 1.8. However, this article is not a proposal of any specific solution. Instead, it is intended as the start of a discussion by the Go community about what solution should be included in Go 1.9.

This article is an extended version of a talk given at GothamGo in New York on November 18, 2016.

2. Introduction

Go’s goal is to make it easy to build software that scales. There are two kinds of scale that we care about. One kind of scale is the size of the systems that you can build with Go, meaning how easy it is to use large numbers of computers, process large amounts of data, and so on. That’s an important focus for Go but not for this article. Instead, this article focuses on another kind of scale, the size of Go programs, meaning how easy it is to work in large codebases with large numbers of engineers making large numbers of changes independently.

One such codebase is Google’s single repository that nearly all engineers work in on a daily basis. As of January 2015, that repository was seeing 40,000 commits per day across 9 million source files and 2 billion lines of code. Of course, there is more in the repository than just Go code.

Another large codebase is the set of all the open source Go code that people have made available on GitHub and other code hosting sites. You might think of this as go get’s codebase. In contrast to Google’s codebase, go get’s codebase is completely decentralized, so it’s more difficult to get exact numbers. In November 2016, there were 140,000 packages known to godoc.org, and over 160,000 GitHub repos written in Go.

Supporting software development at this scale was in our minds from the very beginning of Go. We paid a lot of attention to implementing imports efficiently. We made sure that it was difficult to import code but forget to use it, to avoid code bloat. We made sure that there weren’t unnecessary dependencies between packages, both to simplify programs and to make it easier to test and refactor them. For more detail about these considerations, see Rob Pike’s 2012 article “Go at Google: Language Design in the Service of Software Engineering.”

Over the past few years we’ve come to realize that there’s more that can and should be done to make it easier to refactor whole codebases, especially at the broad package structure level, to help Go scale to ever-larger programs.

3. Codebase refactoring

Most programs start with one package. As you add code, occasionally you recognize a coherent section of code that could stand on its own, so you move that section into its own package. Codebase refactoring is the process of rethinking and revising decisions about both the grouping of code into packages and the relationships between those packages. There are a few reasons you might want to change the way a codebase is organized into packages.

The first reason is to split a package into more manageable pieces for users. For example, most users of package regexp don’t need access to the regular expression parser, although advanced uses may, so the parser is exported in a separate regexp/syntax package.

The second reason is to improve naming. For example, early versions of Go had an io.ByteBuffer, but we decided bytes.Buffer was a better name and package bytes a better place for the code.

The third reason is to lighten dependencies. For example, we moved os.EOF to io.EOF so that code not using the operating system can avoid importing the fairly heavyweight package os.

The fourth reason is to change the dependency graph so that one package can import another. For example, as part of the preparation for Go 1, we looked at the explicit dependencies between packages and how they constrained the APIs. Then we changed the dependency graph to make the APIs better.

Before Go 1, the os.FileInfo struct contained these fields:

type FileInfo struct {
    Dev      uint64 // device number
    Ino      uint64 // inode number
    ...
    Atime_ns int64  // access time; ns since epoch
    Mtime_ns int64  // modified time; ns since epoch
    Ctime_ns int64  // change time; ns since epoch
    Name     string // name of file
}

Notice the times Atime_ns, Mtime_ns, Ctime_ns have type int64, an _ns suffix, and are commented as “nanoseconds since epoch.” These fields would clearly be nicer using time.Time, but mistakes in the design of the package structure of the codebase prevented that. To be able to use time.Time here, we refactored the codebase.

This graph shows eight packages from the standard library before Go 1, with an arrow from P to Q indicating that P imports Q.

Image (Asset 1/6) alt=