The Ultimate Guide to yarn.lock Lockfiles

The Ultimate Guide to yarn.lock Lockfiles
Photo by Noor Sethi

The npm ecosystem is a big reason why JavaScript has taken off like a rocket in development communities. The ability to npm install modular bits of code and compose them together has been a massive boost of productivity for developers.

However, this modularity introduces its own problems: packages need a way to specify their requirements for what other packages they need to work properly. This is the problem package managers like npm aim to solve.

For a time, npm was really the only solution for JavaScript package management. It worked well enough but it wasn't perfect. Facebook for example experienced a number of issues scaling npm to meet the needs of their impressively large engineering team and in response, they built an alternative and Yarn was born.

What is yarn.lock?

One of the innovations introduced by Yarn is the lockfile (called yarn.lock). This generated file describes a project's dependency graph: direct dependencies, child dependencies, and so on. It's a one-stop-shop describing everything your project installs when you run yarn install.

Another feature of yarn is it acts as a security measure by recording a checksum of installed files. That way you can be confident some bad guy isn't sneaking in malicious code.

In short, the lockfile contains all information necessary to ensure you're always installing exactly the same dependencies every time on every machine.

This article is your guide to the in's and out's of the yarn.lock lockfile. We will discuss the anatomy of a lockfile entry, best practices for managing your project's lockfile, and why these concepts are important.

However, to best understand the value lockfiles bring, we first need to understand the concept of dependency graphs.

What is a dependency graph?

Throughout this article, we will be using one of my favorite npm packages as an example: @testing-library/react. If you haven't used it before, no problem. We won't be discussing how the library works; just how the project manages dependencies. (That being said, it's one of the best dang testing libraries I've ever used. I highly recommend it!)

If we look at the package.json file for this project, as of writing we see the following dependencies:

"dependencies": {
  "@babel/runtime": "^7.12.5",
  "@testing-library/dom": "^8.5.0",
  "@types/react-dom": "*"
}

This is a list of dependencies @testing-library/react depends on to function properly; without these, parts of the library (or the entire thing) just won't work.

What this list doesn't tell you are the dependencies that these dependencies rely on. If we were to dig into the package.json files for these dependencies, we would find even more dependencies.

In fact, we could build out a graph of dependencies by following the rabbit trail of package.json files. npm.anvaka.com is a tool that visualizes a project's entire dependency graph. If we plug in @testing-library/react we get this visualization:

Dependency graph for @testing-library/react

(You can play around with this dependency graph yourself here.)

While initially it appears @testing-library/react has 3 dependencies, it in fact has 34 total dependencies including transitive dependencies (aka, dependencies of dependencies). The totality of this graph comprises library's "dependency graph" and it's this graph that the yarn.lock lockfile captures.

Anatomy of a yarn.lock lockfile

A yarn.lock lockfile describes a project's dependencies as well as its transitive dependencies. Each entry in a lockfile has a similar shape and definition with several important attributes. Let's take a closer look at one of these entries.

In the following image, we have installed @testing-library/react into our own JavaScript and get the following entry in our yarn.lock:

Anatomy of a yarn.lock entry

Dependency Name

This is the name and requested version of the dependency as defined in your project's package.json or one of your project's dependencies' package.json. Since yarn.lock is a flattened list of all dependencies that your projects needs to run, transitive dependencies are defined at the same level as dependencies your project defines directly.

This line may contain multiple entries if multiple versions of the same package are requested in different pacakge.json files.

For example, our project may directly take a dependency on @testing-library/react@^12.1.2 but one of our project's dependencies has a dependency on @testing-library/react@^12.0.0. In that case the yarn.lock file would generate something like:

"@testing-library/react@^12.1.2", "@testing-library/react@^12.0.0":

But the rest of the entry would be exactly the same. This is yarn saying, "both of these dependencies can actually use the same version of this dependency." Yarn determines whether two versions can share a resolved dependency via semantic versioning.

A deep dive on semantic versioning is out of scope for this article. What's important from yarn's perspective is as long as two dependency versions are semantically the same, they can share the resolved dependency. (aside: npm has a handy semantic version calculator that is great for playing around with this concept!)

Resolved Version

The resolved version is what version of a dependency was actually installed. Prior to yarn.lock being generated this is determined by the semantic versioning rules in package.json. This means the resolved version could differ from the number specified in the dependency name.

In the example above, the package.json specifies ^12.1.2. If @testing-library/react releases 12.1.3, the ^ means we are open to installing the new patch version. However, it's important to remember that once a resolved version is specified in yarn.lock that will always the version installed whenever you run yarn install. Your project is locked in to that version.

Installation URL

This is the URL yarn uses to fetch your dependency. By default, this will use registry.yarnpkg.com.

You can also specify a different yarn registry using the --registry CLI flag, or by defining a registry <registry_url> in your .yarnrc file. The latter option will make sure yarn always resolves to the specified registry. The CLI flag will only set the registry value for that one CLI command meaning you'll need to set the value every time.

For example, you can set your registry to the default npm registry by adding the following line in a .yarnrc file in the root of your project:

registry "https://registry.npmjs.org/"

Changing this in many cases isn't necessary because the yarn registry is a reverse proxy of the npm registry. That means any package that is on the npm registry can also be installed via the yarn registry.

One common use case for changing the registry is if your team has it's own internal registry for private packages that shouldn't be shared with the public. In that case, the only way to install such packages is to set the registry value to your team's registry value.

Integrity Hash

The integrity hash value in a yarn lock entry is critical to the security of your project. As part of the generation of the yarn.lock file, yarn will compute a hash value for each dependency install based on the contents that were downloaded. Next time you download that dependency, yarn generates the hash again.

If there is a mismatch between the new value and the value stored in yarn.lock, yarn will throw an error that looks like this:

Integrity check failed for <package-name> (computed integrity doesn't match our records, got "<integrity-hash-value>")

And the entire installation will abort.

The reason yarn generates these hashes and compares them at installation time is to prevent potential bad actors from tricking you into installing malicious code.

For example, imagine an author publishes a library at version 1.1.1. This library advertises that it does something simple like adding two numbers together. We add this dependency and verify it does what it says on the box. Perfect!

Our yarn.lock will track the dependency in yarn.lock and store a hash of what was installed.

A few weeks later, turns out the author of this library is a bad guy and adds to their library a script that logs credit card details on any site that uses this library. But instead of publishing a new version, they swap out the file stored for version 1.1.1 with this new malicious code.

Next time we go to install this dependency in our project, yarn will go look at the installation URL, download the file contents and generate a hash. Before now, the hash generated always matched what was in yarn.lock because the downloaded contents were always exactly the same. But, now the contents have changed so the generated hash will be different causing installation to fail with an error like we saw above.

In this scenario, yarn has saved us from installing and using a library that has been hijacked with malicious code. While not every instance of a hash failing automatically means there is a malicious actor, this is a very important aspect of the yarn.lock that will make users aware of some funny business going on and  prompt them to investigate further.

How to fix integrity check failed

The most important first step is to verify that the dependency is still safe to use. Often the best way to do this is to check where the project is hosted (ie, GitHub) and see if others are seeing the same issue. There will often be a discussion detailing either the mix up or how the library has been compromised.

Once you've verified the library is still safe, you can uninstall the dependency (which will remove the entry from yarn.lock) and reinstall to add the library back with an updated hash.

Package Dependencies

When you install a dependency, that dependency will often include its own dependencies in its package.json. In the example above, @testing-library/react has two dependencies: @babel/runtime and @testing-library/dom. The yarn.lock also tracks which versions should be requested via semantic versioning.

The package dependencies are are a list of dependencies that package must have available in order to work properly. This is important because we want to share dependencies as much as possible and not duplicate code. Code duplication leads to bloated bundle size and unexpected behavior (like duplicate instances of react).

Remember that all entries in yarn.lock are flattened into a single list of dependencies. This means even if your project doesn't take on a dependency directly, the dependencies you do take may themselves require dependencies. yarn.lock tracks all of this in a single file where you can see all of these relationships.

Optional dependencies

A yarn.lock entry may also include optionalDependencies. The yarn docs sum this up nicely:

Optional dependencies are just that: optional. If they fail to install, Yarn will still say the install process was successful.

This is useful for dependencies that won’t necessarily work on every machine and you have a fallback plan in case they are not installed (e.g. Watchman).

An example of this can be found in the jsonfile package. The goals of this package aren't relevant. But if we look at the package's package.json, we see a declaration for graceful-fs as an optional dependency:

"optionalDependencies": {
    "graceful-fs": "^4.1.6"
}

If we look at how graceul-fs is used we see the following:

let _fs
try {
  _fs = require('graceful-fs')
} catch (_) {
  _fs = require('fs')
}

The graceful-fs dependency can safely be considered optional because the package will fallback to Node's built-in library, fs, if graceful-fs is not installed.

In summary, yarn will always attempt to install optionalDependencies entries. But, if one or more fails fails – either due to an incompatibility with your project, your operating system, or otherwise – yarn will continue installation instead of aborting entirely.

How to visualize your dependency graph

As mentioned earlier, the yarn lockfile includes all information necessary to describe how your project's dependencies interact with each other. While you can manually follow the dependency chains (or use a visualizer) to figure out why a dependency was included, you can also run a command, yarn why <package-name>, to get a breakdown of the dependency tree.

Here's an example: If we had a brand-new create-react-app project, we see under node_modules there is a folder for lodash even though lodash is not a direct dependency of the project.

Running yarn why lodash outputs something like this:

[1/4] 🤔  Why do we have the module "lodash"...?
[2/4] 🚚  Initialising dependency graph...
[3/4] 🔍  Finding dependency...
[4/4] 🚡  Calculating file sizes...
=> Found "lodash@4.17.21"
info Reasons this module exists
   ...
   - Hoisted from "react-scripts#html-webpack-plugin#lodash"
   ...

I've omitted some of the output for brevity as the important bits are the shape of the "Hoisted from..." logs. What this is telling us is lodash is installed because:

  • html-webpack-plugin requires lodash
  • react-scripts requires html-webpack-plugin
  • our create-react-app project requires react-scripts directly

Sometimes you might get a "root" dependency that isn't one of your project's direct dependencies. In this case, you can then run yarn why on that root dependency until you get to one of your project's direct dependencies.

Should you manually modify yarn.lock?

In short: no. The lockfile is a generated file that is managed entirely by yarn. If try to edit the contents yourself, you run the risk of invalidating the lockfile, possibly causing installation to fail.

For example, as we discussed with integrity hashes, modifying this value to the incorrect value could result in yarn throwing an error because yarn thinks a dependency has been incorrectly modified.

One scenario you might be tempted to manually "fix" the lockfile is when you have a merge conflict. Rather than trying to do this yourself, yarn can fix it for you automatically.

How to fix yarn.lock merge conflicts

Since yarn 1.0, yarn has had built-in support for automatic resolution of git merge conflicts in the lockfile. While it's possible to fix these merge conflicts yourself, for the majority of cases it's not necessary.

When rebasing your branch and runing into a merge conflict, do the following:

  1. Manually fix conflicts in package.json
  2. Run yarn install

yarn will take the fixes made in package.json and determine the correct way to resolve conflicts so that the lockfile reflects the state of the package.json file.

Should you regenerate yarn.lock?

When you run into lockfile merge conflicts, you may be tempted to blow away the yarn.lock file and start from scratch. While this would technically work, you are now taking on the responsibility of updating every package that has an update since you last generated the lockfile.

The issue is with semantic versioning. As discussed earlier, the lockfile makes sure the exact same version of a package is installed every time. Let's look at an example:

  • Your project specifies a dependency with version ^1.0.0.
  • The first time you generate a lockfile, the resolved version for this package is 1.0.0.
  • This package publishes version 1.1.0.
  • You regenerate your lockfile which updates the resolved version to 1.1.0.

In some cases, this may be fine. However, multiply this scenario by every one of your packages (as well as all your dependencies' dependencies and so on) and you could unintentionally be updating 100s of packages at once. That is a lot of verification to do (I hope you have really good tests!).

Personally, I prefer to be in control of when my packages update. That means:

  1. Don't regenerate yarn.lock
  2. Use auto merge conflict resolution
  3. Be explicit about updating dependencies by only updating through your project's package.json.
  4. Use exact versions (eg, dont use ^, ~). This will make sure that even if you regenerate your lockfile, you'll install the same dependency version (note, this doesn't help with sub-dependencies where other package.json files may specify "loose" versions).

Should you commit yarn.lock?

Every project using yarn should commit the yarn lockfile to source control. The lockfile is the source of truth for telling other developers how to install dependencies for your project.

Without this lockfile, other developers will be at risk for installing the wrong packages. This could lead to any number of incompatibilities where there is a version mismatch and the project won't build or run. You don't want to be stuck saying "works on my machine!".

What about Yarn 2 and beyond?

As of writing, the fact is versions of yarn beyond yarn@1.x have failed to gain traction with the community. As far as I know, many of the concepts covered in this doc are also applicable to newer versions of yarn. But since they have less usage, and I have less experience, any differences between versions are out of scope for this article.

Conclusion

Yarn is a wonderful tool for managing your project's dependencies. It solved a number of issues folks had with npm regarding speed and security by introducing lockfiles. (Though it's worth noting that npm has come a long way since then with features similar like package-lock.json).

Because of these enhancements, Yarn has proved to be a big boost to productivity and an essential tool in my developer toolbelt.