Codebase Refactoring (with help from Go)[web search]
Go should add the ability to create alternate equivalent names for types, in order to enable gradual code repair during codebase refactoring. This article explains the need for that ability and the implications of not having it for today’s large Go codebases. This article also examines some potential solutions, including the alias feature proposed during the development of (but not included in) Go 1.8. However, this article is not a proposal of any specific solution. Instead, it is intended as the start of a discussion by the Go community about what solution should be included in Go 1.9.
This article is an extended version of a talk given at GothamGo in New York on November 18, 2016.
Go’s goal is to make it easy to build software that scales. There are two kinds of scale that we care about. One kind of scale is the size of the systems that you can build with Go, meaning how easy it is to use large numbers of computers, process large amounts of data, and so on. That’s an important focus for Go but not for this article. Instead, this article focuses on another kind of scale, the size of Go programs, meaning how easy it is to work in large codebases with large numbers of engineers making large numbers of changes independently.
One such codebase is Google’s single repository that nearly all engineers work in on a daily basis. As of January 2015, that repository was seeing 40,000 commits per day across 9 million source files and 2 billion lines of code. Of course, there is more in the repository than just Go code.
Another large codebase is the set of all the open source Go code
that people have made available on GitHub
and other code hosting sites.
You might think of this as
In contrast to Google’s codebase,
get’s codebase is completely decentralized,
so it’s more difficult to get exact numbers.
In November 2016, there were 140,000 packages known to godoc.org,
and over 160,000
GitHub repos written in Go.
Supporting software development at this scale was in our minds from the very beginning of Go. We paid a lot of attention to implementing imports efficiently. We made sure that it was difficult to import code but forget to use it, to avoid code bloat. We made sure that there weren’t unnecessary dependencies between packages, both to simplify programs and to make it easier to test and refactor them. For more detail about these considerations, see Rob Pike’s 2012 article “Go at Google: Language Design in the Service of Software Engineering.”
Over the past few years we’ve come to realize that there’s more that can and should be done to make it easier to refactor whole codebases, especially at the broad package structure level, to help Go scale to ever-larger programs.
3. Codebase refactoring
Most programs start with one package. As you add code, occasionally you recognize a coherent section of code that could stand on its own, so you move that section into its own package. Codebase refactoring is the process of rethinking and revising decisions about both the grouping of code into packages and the relationships between those packages. There are a few reasons you might want to change the way a codebase is organized into packages.
The first reason is to split a package into more manageable pieces for users. For example, most users of package regexp don’t need access to the regular expression parser, although advanced uses may, so the parser is exported in a separate regexp/syntax package.
The second reason is to improve naming.
For example, early versions of Go had an
but we decided
bytes.Buffer was a better name and package bytes a better place for the code.
The third reason is to lighten dependencies.
For example, we moved
io.EOF so that code not using the operating system
can avoid importing the fairly heavyweight package os.
The fourth reason is to change the dependency graph so that one package can import another. For example, as part of the preparation for Go 1, we looked at the explicit dependencies between packages and how they constrained the APIs. Then we changed the dependency graph to make the APIs better.
Before Go 1, the
os.FileInfo struct contained these fields:
Notice the times
Ctime_ns have type int64,
_ns suffix, and are commented as “nanoseconds since epoch.”
These fields would clearly be nicer using
but mistakes in the design of the package structure of the codebase
To be able to use
time.Time here, we refactored the codebase.
This graph shows eight packages from the standard library before Go 1, with an arrow from P to Q indicating that P imports Q.