Chris Tankersley

· PHP Jack of All Trades ·

Many people are familiar with the concept of Time Travel. Time Travel is a great storytelling trope, and is used for movies like "Back to the Future," books like "The Time Machine," and television shows like "Doctor Who." One common theme between all various retellings of time travel is that it is fraught with danger. In "Back to the Future" Marty has to be careful not to alter the past lest he disappear. The Terminator franchise sends robots back in time to make sure the robots do not rise to power. In the end, the warning is the same - altering the past can have grave consequences.

git is a time machine. It allows you to travel up and down the history of your project. Each commit is a blip on that timeline. A commit is a valuable, recorded piece of history.

a git timeline

git can allow you to do a great many things as a time traveler. One of the simplest is to just go back in history and observe. git checkout <hash> will move the view of your timeline back to the specific hash, but it does not destroy any work anyone has done. You can always git checkout HEAD to go back to the end of the timeline, or "now." In a way, this is a safe version of time travel. You cannot really mess anything up jumping around commits like this.

a git checkout example

When you go back in time, git lets you know that anything you do in this checked-out state is ephemeral. If you want to make a change, git tells you to create a new branch. A branch is just a parallel timeline. Branches do not interact with the main timeline until you are ready to "merge" the parallel timeline. You can create, alter, and destroy branches as much as you want without affecting the main timeline.

a git branch example

When you are ready to merge a branch, the contents of the branch are added to the timeline as if it was there the whole time. git does some reconciliation to bring the branch into the main timeline (this is what the "merge" commits are in the history), and that alternate timeline/branch is now a part of the main timeline's history.

a git merge example

None of this alters existing history, it just appends to it. This is why (generally) branching, committing, and merging are safe workflows.

Rewriting History

This is great and all, but what happens when you need to actually change history, not just append to it. That's where git rebase comes into play. git rebase is a way to rewrite history. That power is as wonderous and scary as it sounds.

There are three times that I find git rebase is useful daily. The first is a branch sync, where I'm working on a remote branch and want to make sure everything is synced up. The second is hiding inconsequential commits from my local history before I push a branch up. The third is because a branch's parent has updated drastically, and I want the new code in my branch. Each of these instances requires me to rewrite what my local history looks like, so we need git rebase.

Why do we go through with all of this? The simplest reason for me is having a clean commit history. Rebasing allows us to make sure that a timeline looks clean with branches coming out and back in, and that the commits that exist matter. Anyone who comes along to work on the code should be able to quickly pinpoint various changes and lifecycles from the timeline. Rebasing allows us to keep a cleaner timeline through the ability to manipulate it.

Rebasing encompasses a list of different things under the hood, so let's look at the actual usages I find common and what is going on.

Branch Syncing

A command I use daily is git pull --rebase origin <branch> to pull remote changes into a local copy of a branch, especially the main branch of a project. While git pull by itself will bring down the changes, it does so by bringing down the changes and stuffing it around any local changes I have made but have not published. This is not a huge problem, but it can become a larger issue when conflicts occur.

Let's say our main branch histories look like this, and my local branch has an extra merge I did locally because of a bug fix:

// Remote
244426b0 2 hours ago - Merged branch feature/time-feature
a7ec8a44 3 days ago - Added Time class
a9cad80d 1 month ago - Deleted unneeded dependency on Vendor\Baz
21420248 1 month ago - Initial Commit

// Local
e454fa2e 10 hours ago - Merged branch bugfix/broken-dependencies
c23d5765 10 hours ago - Updated dependencies
a9cad80d 1 month ago - Deleted unneeded dependency on Vendor\Baz
21420248 1 month ago - Initial Commit

what the two repos look like before pulling

git pull will bring down the two missing commits, but the order will not be as clean as it will order the timeline based on commit time, not necessarily the order in which things were added to the timeline.

244426b0 2 hours ago - Merged branch feature/time-feature
e454fa2e 10 hours ago - Merged branch bugfix/broken-dependencies
c23d5765 10 hours ago - Updated dependencies
a7ec8a44 3 days ago - Added Time class
a9cad80d 1 month ago - Deleted unneeded dependency on Vendor\Baz
21420248 1 month ago - Initial Commit

what out local repo looks like after a normal pull

This is not the worst thing in the world, but since I have not pushed my local commit I would prefer it not get mixed in with the existing timeline. git pull --rebase goes a step further and moves any local commits to the end of the timeline, after any remote commits. Our history ends up looking like this:

e454fa2e 1 minute ago - Merged branch bugfix/broken-dependencies
c23d5765 1 minute ago - Updated dependencies
244426b0 2 hours ago - Merged branch feature/time-feature
a7ec8a44 3 days ago - Added Time class
a9cad80d 1 month ago - Deleted unneeded dependency on Vendor\Baz
21420248 1 month ago - Initial Commit

what out local repo looks like after a rebase pull

It is a small change, but now our local work is moved and timestamped as after the other work. To me, this is a cleaner history and better group work being done together.

When is this safe?

This is generally safe on most branches. git will only rebase and move your local commits if they do not exist in the remote repository. You may still run into conflicts if you edit the same file as someone else, but the rebase will stop and you get a chance to fix things. Overall this works best when you keep this branch up-to-date as much as possible and are pushing your commits as much as possible. In the worst cases, you can git rebase --abort to stop the pull and rollback and do a traditional pull.

Hiding Unneeded Commits

The other daily rebase I do is when it comes to hiding commits that no one cares about. Have you ever seen a commit log like this?

$ git log --pretty=format:"%h %cr - %s"
b2b99fb 25 seconds ago - Updated dependencies
244426b 4 months ago - Added The Contracts of Open Source
a7ec8a4 1 year, 7 months ago - Fixed comments maybe
a9cad80 1 year, 7 months ago - Typos and fixes
2142024 1 year, 7 months ago - Added some responsiveness
1ba90ea 1 year, 7 months ago - Added false promise post
c23d576 1 year, 10 months ago - Fixed some more grammar issues
e454fa2 1 year, 10 months ago - Fixed typo
e1fd605 1 year, 11 months ago - Upgraded to sculpin 3
be8b7ec 2 years, 1 month ago - Overhaul of the talks and updated a ton of data
9f670ee 3 years, 1 month ago - Update .gitignore
00980fd 3 years, 5 months ago - Clarified a sentence
3ddd2f1 3 years, 5 months ago - Fixed typos
2df0ba3 3 years, 5 months ago - Added cron expression post

Commits in git should be atomic. Atomic commits are commits that do a single thing, but that single thing is a unit of work. In the case above, adding the "false promise" post is actually four total commits - 1ba90ea, 2142024, a9cad80, and a7ec8a4. The last three commits are fixing issues in the post, but the entire unit of work is comprised of those four commits. I should rewrite them as a single commit.

An interactive git rebase is the perfect tool for this job. In this case, we will tell git we want to rebase everything after a certain point, and by making it interactive git will let us manipulate history to clean it up. The first thing we'll want to do is figure out when we want to start to rewrite history. In our case, the new blog post was originally added in 1ba90ea, so we will want to go one step earlier to c23d576. This is the first hangup many people run into. The hash you specify is not included in the list to manipulate.

Now we just tell git to remove the locks and let us work:

git rebase -i c23d576

git will open a text editor and place all of the commits after (but not including) c23d576 into a nice little list for us. We are going back a bit in time so we are dragging in changes after the post, which is another potentially confusing area. What are we looking at?

pick 1ba90ea Added false promise article
pick 2142024 Added some responsiveness
pick a9cad80 Typos and fixes
pick a7ec8a4 Fixed comments maybe
pick 244426b Added The Contracts of Open Source
pick b2b99fb Updated dependencies

# Rebase c23d576..b2b99fb onto a7ec8a4 (6 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message

git timeline before we rebase

The first four lines are the commits we can manipulate. The first word is a command, which the commented area at the bottom will detail. I have listed the most common five things you can do. The second column is the commit hash, and the rest of the line is the commit message. This screen allows us to queue up what we want to do and will execute it when we save and close this file.

What commands do we normally use?

  • pickup - Just use the commit as is
  • reword - Use the commit, but change the commit message
  • edit - Use the commit, but stop and allow us to amend it with further changes
  • squash - Use the commit, but merge it and the commit message with the previous commit
  • fixup - Like squash, but ignore the commit message

What we want to do is create a single unit of work for adding the post, but we do not want to lose the last three commits. We have two options - squash or fixup. Since history does not care that I made typos or fixed some issues with responsiveness, we will go with "fixup". We will edit the lines to look like this:

pick 1ba90ea Added false promise article
fixup 2142024 Added some responsiveness
fixup a9cad80 Typos and fixes
fixup a7ec8a4 Fixed comments maybe
pick 244426b Added The Contracts of Open Source
pick b2b99fb Updated dependencies

We can then save the file and exit the text editor (^X in nano, :wq in vim). git will then start to do the commands we told it to.

  1. "pick" 1ba90ea and use it as-is
  2. "fixup" 1ba90ea by adding the changes from 2142024
  3. "fixup" 1ba90ea by adding the changes from a9cad80
  4. "fixup" 1ba90ea by adding the changes from a7ec8a4
  5. "pick" 244426b and use it as-is
  6. "pick" b2b99fb and use it as-is

If we look at the log now we will see that those other commits have disappeared, but if we look at the files all those changes are still intact:

$ git log --pretty=format:"%h %cr - %s"
ed4d525 64 seconds ago - Updated dependencies
231fe64 64 seconds ago - Added The Contracts of Open Source
8eda557 64 seconds ago - Added false promise article
c23d576 1 year, 10 months ago - Fixed some more grammar issues
e454fa2 1 year, 10 months ago - Fixed typo
e1fd605 1 year, 11 months ago - Upgraded to sculpin 3

We rewrote history to get rid of my typos! Now no one will need to know that I am a poor speller (no comments on this post about spelling mistakes. That's what Twitter DMs are for).

git timeline after we rebase

What happened? git rolled back our repository to c23d576 - we went back in time and undid everything after that point. git, however, remembers all the commits after that, so began to rebuild history at that point using our instructions. git began layering on those old commits, but this was new work so we get new hashes. The three "fixup" commits are effectively wiped from history as the pointers shift to make it look like c23d576 went directly to a7ec8a4. A new commit is generated, 8eda557, to symbolize that new history. The last two commits are layered in the same way, with new hashes to symbolize their new place in the timeline.

A side effect of this is the timestamps now all show our fixed-up commit, as well as everything after it, looks like they were done "now" (or 64 seconds ago as I live test all of this). That is because those changes were rewritten 64 seconds ago. We altered history, so git accurately reflects that. It will not lie, and it is not a lie to say we changed 231fe64 and ed4d525 64 seconds ago - they were part of the rebase.

In the end, we are left with a single, atomic commit for a post. This provides a cleaner history and a much more useful history. We can see when the post was committed, but we hide those fixes because ultimately all history needs to worry about is that the post was added.

What happened to those original commits? They still exist, but we no longer look at them in the timeline. Technically the timeline actually looks more like this, with those original commits almost like a branch from our rebase start. You could actually checkout b2b99fb and trace its history back to the original rebase commit as it exists in something called the "reflog", which is the full commit history of the repository. For all intents and purposes, it is no longer part of the timeline we are on.

what it really looks like after we rebase

When is this safe?

This type of rebasing is only safe 100% of the time in two instances. The first is on code YOU HAVE NOT PUSHED. I routinely make small commits working toward a goal or fixing bugs I introduce in code. Having those commits are great ways to roll back if things go off the rails. When you are happy with your code, rebase and fixup/squash all those down. This is easy to deal with as it does not deal with force pushing.

The other time this type of rebasing is safe is when YOU are the only one to be working on a branch. In many instances, we will be working in separate feature branches all on our own. In those cases, it's perfectly fine to rebase commits in that branch and push them up for review. I normally do this when I have very small units of work that need to be done, like with a few lines of code. If I have a larger unit of work, like say a new feature, I will tend to fixup locally but keep a history of the larger blocks of work being done. Keep in mind that this will require a force push, which will forcibly reset the remote branch to mirror your local branch.

If someone else is working on a branch with you, DO NOT REBASE AND FORCE PUSH to a remote branch. This will cause issues with other collaborators. Yes, it's all solvable, but it can cause a lot of issues as everyone tries to reconcile the force pushes.

Pulling In Parent Branch Changes

This situation is the one where most people run into issues. This workflow is when you have a feature branch that comes off your mainline code branch, which we will call main. main is getting updated constantly with other feature branches being merged in, and your feature branch needs to be updated to take advantage of those changes. You have two options. The first is to pull down and merge main into your feature branch. This will create a merge commit, and git will do its best to figure out how to merge the changes together.

merging main into a feature branch

The second option is a rebase on main. As with our interactive rebase this will effectively move all of our feature branch commits to be after the current version of main and start our branch there. Our branch structure is a bit cleaner as we can better see that we depend on code from the 5aac552 commit rather than the cfe88cd commit we originally branched off of.

rebasing main into a feature branch

When is this safe?

It's always safe.

But Chris, this always causes conflicts!

Many times it does, but not because of the reason you think. The conflict occurs when git attempts to take your existing commit and reconcile it against the new files. If one of those files from main is a file you also edited, you may get a conflict. git tries its best to figure out how to merge changes but sometimes it cannot. This means you have to figure out how to reconcile it, or git rebase --abort and stop the rebase.

The larger issue at hand is that two developers changed the same blocks of code during two different feature sets. The solution is not to stop using rebase, but to better understand the scope of feature branches and make sure that work is not being done concurrently on the same portion of the codebase. This problem is exacerbated when a feature branch is very wide in scope or has a long lifetime as this widens the number of files that can be altered.

If you are constantly running into issues rebasing on a parent branch, first look at the work being scheduled. Make sure that the issues being worked on do not overlap in scope or code. Second, make sure that feature branches are short-lived. A branch that exists and is worked on for weeks or months at a time is grossly over-scoped. Break it into smaller feature branches or units of work in your planning. A good rule of thumb is that any issue that takes more than 8 person-hours is not broken up enough.

Your other option is to not use a rebase and just merge up from main. Rebasing is not an always or never situation.

"I'm worried about losing work"

Rebasing generally does not cause work to get lost, but resolving merge commits do. This is the biggest issue with most rebasing problems.

One of the most common mistakes is taking a feature branch and rebasing the parent. If there are a lot of differences as outlined above, you get a merge conflict. You now have to look at the code and figure out the best way to fix this, and there is this monster staring, urging you to finish the rebase as quickly as possible. You are stuck until you fix this.

You resolve the merge, and after-the-fact realize you lost a chunk of code, or you accidentally reverted the code to an earlier version. Now you are out of sync with main. You squashed something wrong and now the commit you needed is gone. What do you do?

The nice thing about git is that nothing is ever lost. If you need to find a commit, that is where the git reflog comes into play. The reflog is an activity log of what you have done with your local repository. You can use the reflog to find older commits and check them out, or reset them, or cherry-pick them back into the current timeline. It is not a complete history, however, and is designed to clean itself out every so often. The reflog is not a magical backup tool.

$ git reflog show
d4d525 (HEAD -> master) HEAD@{0}: rebase -i (finish): returning to refs/heads/master
ed4d525 (HEAD -> master) HEAD@{1}: rebase -i (pick): Updated dependencies
231fe64 HEAD@{2}: rebase -i (pick): Added The Contracts of Open Source
8eda557 HEAD@{3}: rebase -i (fixup): Added false promise article
5eec634 HEAD@{4}: rebase -i (fixup): # This is a combination of 3 commits.
083b44f HEAD@{5}: rebase -i (fixup): # This is a combination of 2 commits.
1ba90ea HEAD@{6}: rebase -i (start): checkout c23d576
b2b99fb HEAD@{7}: checkout: moving from 244426b03db6d06d62d4a6bedcb64c4baef3acb4 to master
244426b (origin/master, origin/HEAD) HEAD@{8}: pull --rebase origin master: checkout 244426b03db6d06d62d4a6bedcb64c4baef3acb4

Remember that earlier example where we ran a rebase and "fixup" 'd a few commits? This is what that rebase looks like in the reflog. I can git reset or git checkout any of the commits in the reflog. By default the reflog only shows the current branch, you can use git reflog show --all to set all the activity no matter the branch.

"Force pushing is scary"

Force pushing is scary because it can cause a lot of problems if other people do not realize you force pushed. Force pushing resets the timeline so that the remote branch matches your local one completely. The issue here is that if multiple people are also pulling down a branch from a central repository, a force push can quickly get them out of sync. This is most destructive when multiple people are committing to the same branch, and someone force pushes to that shared branch.

The good thing is the solution is easy! NEVER FORCE PUSH TO A SHARED BRANCH. Just do not do it.

If you must force push to a branch, then alert the other developers working on that branch. They will have two things they can do. The first is to fetch the new branch and git reset themselves to match. The reflog can be used to git cherry-pick their lost commits over.

The second is to do a git pull --rebase to try and automatically reconcile the branch. Depending on what caused the force push in the first place you may have merge conflicts, but hopefully, the reason and changes are clearly communicated by whoever did the force push.

Worst case you can always create temporary branches to play around in before you mess with your local branch.

"I hate all the merge conflicts and constant merges"

This goes back to one of my earlier notes: try and limit the scope of feature branches and make sure branches are short-lived. If you have lots of overlapping work between branches you are going to run into conflicts at some point anyway.

"I wish there was a way to see the outcome first"

The best way to do this is to create a temporary branch. Let's say we have a feature branch and we want to rebase on main, but see how that looks first.

// Assuming we are on our feature branch `feature/cool-feature`
$ git fetch origin main:main
$ git checkout -b dr-rebase-cool-feature
$ git rebase main
// See what happens and abort of fails

Since git allows you to create and throw away branches with ease, branching should be taken advantage of when possible.


Hopefully, this helps with showing when rebasing can be used, and what to do when things go wrong. Always remember you can git reset and use the reflog to pull back missing commits, and never force push to a shared branch!

Posted on 2021-03-21

Comments


Many people do not realize how old the concept of open-source software is, or that that software basically started as a shared development experience. In the 1950s computers did not come with software for the most part. Developers would be forced to develop software for the platforms that they had access to. Since many of these early developers were introduced to computers during college, one of the few institutions that could afford even machines like the TX-0 or the A-2, the general ideas of academia bled into software development.

These developers, be they students or faculty at the schools, would share their software just like they would share knowledge. This fed quite well with the early hacking ideas of "Information Should Be Free." If someone developed an algorithm or a utility it was shared amongst everyone else. Early hacking culture introduced the idea that it was also fine to modify the software. The unwritten rule was that it was shared.

Software in its infancy was open source, we just did not have a name for it. The idea of commercial software was not even considered. Software from even the machine vendors was just considered part of the machine. You purchased hardware, not software. You created the software and shared it with your colleagues. If you had a problem with the software you could talk to the author, or you fixed it yourself. There was no expectation other than the sharing of knowledge.

Fast forward a few decades and commercial software start to appear. As machines become somewhat more standardized and development starts to become more costly, developers like Micro-Soft start not sharing their software but selling their software. Software becomes a product and begins to unbundle from the hardware. Need a compiler? That will be an extra cost of $60-$75 on top of the hardware.

The ideas of open source never went away. Systems like the Berkeley Software Distribution for Unix flourished despite AT&T's attempts to lock down the licensing of the early Unix source code. The GNU Project was formed in 1983 with the explicit purpose of making sure that users keep control over the software that they run, and have the freedom to modify it as they see fit. In keeping with the early developers and hackers, the GNU Projects forced developers to share their modifications but gave them the ability to study and copy software as well.

Much of the modern web only exists because of open-source software. Apache's httpd has and continues to power a huge swath of the Internet as a whole. Subsystems for communication like e-mail are more often than not powered by open-source software. Windows had a networking stack lifted from BSD. Companies like Mozilla and Redhat power the internet and give developers and users open-source options for their software and operating systems. If open-source software would disappear, the tech world would come to a screeching halt.

When a developer decides to put their time toward open-source software, they are giving up their time and sharing their expertise. Despite all the grumblings about not owing users anything, most open source developers spend time working with and listening to users and other collaborators to make their software better. They build tools and programs to make their lives, and by extension others' lives, better.

There is a contract that is established when using open-source software. I do not mean the license, a legal document that spells out what you can and cannot do with a piece of software. There is a deeper moral contract that both the developer and the end-user enter into.

The contract says that the developer of the software is giving up their time for the greater good. To make sure you can use the software as you see fit, they give up the secrets of that software so that the user has access to their knowledge. The user can dig into the code and see how it works. The user can change the software as they see fit, no matter what the original developer designed. The user should share their knowledge about the software with the rest of the world through patches and collaboration.

The contract does not state that the original developer must follow the whims of the users. The contract does not state that the original developer owes the users anything. If a user feels that something is lacking, the user is expected to collaborate or do the work themselves. Are the docs lacking? Anyone can write docs for open-source software. The original developer is free to ignore the work done by users if they see fit. The users are just as free to post the docs even if they are not wanted.

I find it despicable what Sebastian had to endure with PHPUnit and PHP 8 support. Sebastian, as the original developer and maintainer of PHPUnit, is more than free to dictate how he spends his time working on the software. If there is a demand for PHP 8 support, he is more than free to weigh the pros and cons of adding that to older versions despite what he has already noted as his support structure for releases.

If you wanted PHP 8 support in older versions, you are more than welcome to fork the software and add support for PHP 8. You are welcome to work with Sebastian to try and get it into the official releases, but Sebastian is not under any obligation to give in to the masses. He has already given the tools needed to modify PHPUnit freely to the world. His knowledge is laid bare in the source code.

If PHPUnit is not to your liking, study the code, and modify it yourself. Distribute the patches and the forks back out to the world just like Sebastian did. That is the power that open source gives you under the contract.

I think a large number of developers forget that many open source projects are run by individuals. These individuals seldom make money directly on the software, but hold other jobs to pay the bills or maybe sell ancillary services around the software they build. Unless you are specifically paying Sebastian as a contractor to modify PHPUnit to your liking, you do not get to demand anything of him or any maintainer. You can ask, and he can say "No." If you do not like that answer, the source code is there for you to change.

When a library is changing too quickly for your liking, you are free to stay on an older version. Most developers do not remove older unsupported versions of libraries or applications just because a new version is out. If a developer wants to support older versions of their software, that is a decision they make. That is not a decision the users make. You are making a conscious decision to stay on that older version, and you need to live with the consequences of that decision.

If your hands are tied because of an outside force and are unable to upgrade, the only advice I can give to you is to implore those that have the power to make that decision to upgrade. If it is a boss or a CEO, make the case. Explain why upgrading is beneficial, and how staying behind is becoming a drain on development. You must complain to those in power about your situation, not to the developer of the libraries and tools you use. Those developers' obligation to you ends when they share their knowledge and software.

When you use open source software, remember that it was built on the ideas of "Information should be free," collaboration, and the betterment of everyone.

There are humans behind that source code you are using.

Posted on 2020-11-30

Comments


On August 30th, 2019, Sara Golemon (@saraMG) tweeted out that developers on PHP 7.2 should start planning on their upgrade path to 7.3 or 7.4 since it was about to go into "security-only" mode, which means only security-related patches would be issued for it. If you were on 7.1, it was about to be End-Of-Life'd, which means 7.1 will receive no further patches.

As a "hot take" to this, Sherri W. (@SyntaxSeed) responded with:

I responded to this with my own thoughts:

From there other people joined into a bit of discourse over whether or not a long or short release cycle helps developers. Developers weighed in on both sides.

The Arguments For an LTS Release Cycle

Clients Won't Pay for Upgrades

From Sherri's perspective as a freelancer with 20-30 clients, it is hard to get a client to pay money just because the underlying language has upgraded. We already have problems trying to justify why we should pay for testing, so coming back to a client a year or two after a project is finished just to pay for a non-functionality-adding upgrading can be a hard sell.

I understand the reasoning. I had two clients that were on PHP 5.2 for a very, very long time. When I say "long time," I mean PHP 5.2 had been released in 2006, and these projects were from still in use well into PHP 5.5's release.

The first was a small Bed and Breakfast site that was written into Wordpress. The reservation system that they used was source encrypted with IonCube, a source encryption extension for PHP. The client refused to pay for an upgrade for this plugin, and since it was wrapped in IonCube we could not manually upgrade it. It refused to work on PHP 5.3 or anything higher. I and the original contractor who worked with her could not get it to work.

The second project was a local government project. They had a loaned server that had been paid with through donations that ran Windows 2000 and was hosted at a local library. Since it had been paid and maintained through donations, it was locked to this hardware. The library would only support the machine if it worked with the AD controller. That left us on Windows 2000.

This was around what would be the end of PHP 5.3's life. When 5.4 was released I contacted them about upgrading, especially because Zend Framework 1 was well outdated as well. There was no money for an upgrade at the time.

In both cases, it was a business decision motivated by money that these pieces of software stay at 5.2. They both stayed at 5.2 for a very, very long time.

From Sherri's tweets, she is in much the same boat - many customers just do not want to pay for arbitrary upgrades for infrastructure. You could try and bundle it with new features, but then they may balk at the cost and still decline the project. If releases were slower, they could be tied to more major upgrades.

Business Can Move Slow

Lars Moelleken (@suckup_de) mentioned that sometimes business processes move slower than release cycles. This means that businesses that need stability look toward an LTS release to provide that stability while the business can still provide value for the life of a project.

I have seen this as well. One project I worked on had many different requirements, which included the list of allowed operating systems and software requirements. We had to work with a series of hardware that only worked with specific Linux kernels, and some distributions shipped with versions of libraries we needed (specifically a version of OpenSSL that had some hardening patches applied).

We also had to be very cognizant of changes to the codebase. We had to be careful not to break anything, as loss of functionality could have some very bad consequences for our uses. Getting patches installed for bugs was done in months, not days.

This meant that many of our software stayed on older versions of languages or libraries. Python and PHP were both well out of EOL when I started at the company. When I left, at least PHP was at 5.6, and Python had started a crawl toward Python 3. The underlying OS had not changed, but because the distro did not support rolling upgrades we could do little for in-place upgrades.

We had planned on upgrading all of this, but much of it was tied to a sales cycle and maintenance timeframes. We would have to maintain two versions of the software which was an additional cost for us. It was decided that we would try and upgrade what we could when we had time, and try to push the customer to a new sales cycle which would allow us to switch them over.

We Just Can't

This argument came up during our weekly group get-togethers where we just have a video call and hang out for an hour. The main focus had been a coworker who used to work for an insurance company that did most of their work in Java 6.

When he started, he wanted to use some newer best practices and libraries that would have made their lives easier. There was a lot of pushback from doing this from various sides.

Many of the arguments revolved around either that there was not enough time, or previous consultants had already decided that "Solution X" was a bad fit for the company. A few developers mentioned that some of the new things would just never work for the current solution due to "technical restraints."

In this case, the development team decided that it would be too much work to push forward with an upgrade. Java 6 still worked, so it was better to just continue to deliver functionality with the current setup. Maybe new projects could allower newer setups.

Why LTS is Bad

In all three of the above excuses, actual reasons had been put forth as to why a slower release cycle would be better.

All of them are completely invalid and just excuses for not doing work. I understand the why of each argument. I just do not accept them because, in the long run, they are just causing more work and more pain in the upgrade process. This makes it even hard to justify upgrades because they will cost more and take more time, and are much more prone to failure.

If There Isn't Money Now, There Won't Be In The Future

In the Bed and Breakfast case, she ended up paying for a server all to herself running PHP 5.2 and an older operating system. This also required her to sign off on a security waiver stating that she understood. I am not 100 percent sure she really did understand, otherwise, she could have paid for the new version of the plugin. As a contractor, I had to protect myself.

By the time I had stopped consulting for her, the plugin was not even maintained anymore - she would have to pay for an entirely revamped reservation system. The cost went from what I think was $75 (at the time) for the plugin upgrade to nearly $2,000 to just replicate what the old plugin did.

The Zend Framework 1 application is still in use. I just checked and it was moved to a host running PHP 7.0. The project was never upgraded. In fact, I know this because there were some workarounds I had to do to get Zend Framework to run under Window 2000's version of IIS. The site is now running under Apache httpd, according to the headers, but still has those workarounds. They just moved it. It was a simple application with no private data so I am not really worried from a security standpoint, but no one has bothered to upgrade it.

They have contacted me on-and-off through the years about doing upgrades, but each time the cost is bundled with an upgrade to something newer, and well outside of the price range they want to pay for changes.

If a framework, OS distribution, or language has an LTS release, this increases the length of time between supported releases. This adds additional complexity to upgrades, which increases costs. The increased cost and time are usually seen as a waste because of no new tangible benefit from the increase. Why pay for something that does not add new features or revenue?

Frameworks like Symfony do a good job of having their final releases in a version be somewhat compatible with the new version, making upgrades easier. Even with 3.4 being LTS, the next LTS is 4.4... which if developers are not upgrading now, that means they are waiting for the next LTS, which is going to take longer and therefore cost more, to implement.

If Your Business Moves at a Glacial Pace, That's Your Fault

Saying that a business moves slowly, and therefore release cycles should move slowly, is a farce. I never accept this is a good answer. In fact, Sara Golemon can back this up:

Much like putting off an upgrade because there is no money in the budget, purposefully putting off upgrades leads to the exact same problem - you push an upgrade off until the point it's painful, and the amount of time and money has now increased. Going for Symfony 3.4 to 5.x will not be straightforward. Moving from Ubuntu 14.04 to 18.04 will cause a lot of things to break.

You now are forced to spend more money and time than if you had just kept up with upgrades and updates. Rewriting software from scratch is more expensive than refactoring.

I fought very hard to move to PHP 7 and Python 3 on the one project, and to upgrade the underlying OS. During my tenure, we went from 5.4 to 5.6 with nothing but package upgrades and got those into production without the clients ever noticing.

We did actually get the PHP 5.6 to 7.2 code migration finished (just not put into production) by the time I left. Since my predecessor and myself took great care to use best practices, the actual number of things that phpstan found were fixed in a few hours. Unit tests were added around them to make sure nothing broke.

The Python code was a mess and primarily 2.6, so it was mostly a lost cause. A rewrite was started that included tests upfront. It was not completed when I left, but it was light-years ahead of where the 2.6 code was. The only problem was it meant pulling our lead Python developer off for a few months to do the work, pushing back a release.

I cannot find the tweet for the life of me, but someone brought up the longevity of developers. Since job movement is fairly frequent in our industry, leaving an upgrade for two to three years can mean the loss of knowledge that is required for these upgrades to go smoothly.

By saying that your business processes move slow, and accepting that, you are only making it harder on yourself, or the people that come after you. You are costing your company more money in the long run.

You Can, You Just Don't Want To

This is usually where most developers end up when it comes to legacy code. The code is in such bad shape that it is hard to fix, so there is an unconscious bias to upgrading it. They are worried about having to go to their boss and explain why they need to upgrade and are afraid of being shot down. It is easier to just stay the course and develop features.

As with most of these situations, you are just delaying the inevitable. You are going to have to upgrade someday, and you do not want that someday to be when a massive CVE comes out of nowhere that you have to handle.

Ben Ramsey (@ramsey) does bring up a good point:

Except that you are at the mercy of a maintainer who then decides if a security feature should be backported. As these are backported, they introduce divergences in the codebases, both from the upstream provider (say PHP Internals) as well as have the possibility of changing behavior. There will be some security fixes that cannot be backported because they deal with newer or changed code, so now the maintainer has to decide on re-implementing the fix or leaving it out.

If you hold off because "someone else provides support," or "we just use what is in the repositories," you are giving up and deciding to stay where you are. It will only end in you still being behind and spending more time and money when you have to make an upgrade.

And if you are holding off because you still use mysql_* functions... Stop. Get off your butt and change it. You've literally had years to fix this.

What Can You Do?

First and foremost, start your planning now. Depending on the quality of your application and the age of your infrastructure, you may have little or a lot of work to do. The sooner you start planning the easier time you will have.

Sell The Upgrade

You can start small and start to make some changes right away while you make the business case. Explain to upper management how not doing these upgrades are going to leave you in a bad spot. Here are a handful of things you can use:

  • Finding developers who want to work on older software is always hard. Finding developers who want to work in old languages is harder.
  • Even with backported security you are at a security disadvantage. It takes time for the backports to happen if they actually happen at all.
  • Libraries and tools move on. You will be left with substandard tooling compared to competitors who stay up-to-date.
  • As libraries update, that leaves you to maintain them (especially manually back-porting security patches). This is more work for your developers, and less time you can put toward new features that matter.
  • To do an upgrade at a later date means spending even more time not working on new features. This can put you behind competitors.
  • If you use AWS/Azure, moving to newer PHP versions can get an automatic optimization, meaning fewer servers, which means less cost.

If you find it impossible to sell doing the upgrades, you have two options - do it anyway, or leave.

If you think you can get away with it, or you have the power to do it, go ahead and just do the upgrades. If you really look into it you might find straight version upgrades are trivial, but worst case you will get an accurate estimate on how long the upgrade will take. Remember, the longer you delay, the longer the upgrade will take.

If a company cannot take the time to understand why being up-to-date is a business advantage, then move on to a company that does understand it.

Run Multiple Versions of PHP

I use both Docker and phpenv to handle multiple versions of PHP on a single machine. I can switch between them with some changes and try out my code as I upgrade.

For Docker, you should just need to change your Dockerfile or switch containers out. It will depend on how your setup is configured. A huge selling point of Docker is the ability to swap out containers, so if you are using Docker (and honestly, if you are using Docker but can't upgrade PHP... WTF?!), this should be fairly easy.

For locally installed PHP, I love phpenv. It allows you to have multiple versions installed at once, and has directions for setting up both PHP-FPM and Apache httpd.

Laravel Homestead is one other option. It is a vagrant-based virtual machine with PHP 5.6 through PHP 7.3. Even if you do not use Laravel, you can throw a normal PHP application in there as well and start switching PHP versions.

PHP tries very hard to keep backward compatibility, so unless you are using a deprecated feature like mysql_* functions your app might just work out of the box.

Figure Out Code Changes

Look at upgrading your PHP version first. If you are on PHP 7.0 or 7.1, great! PHP does an awesome job at adhering to SemVer, so there should be little work you need to do for the minor versions. The PHP manual contains release and migration notes for each version since 5.0. Read the migration notes for each version:

Ignore new features and focus on any changes you need to make.

Tools like phpstan can check your code against PHP 7 and make suggestions on things you will need to change. As I mentioned before, I took our PHP 5.6 codebase and ran it against PHP 7.2 and only had a handful of things to change. You may have more, but it gives you a detailed list of what needs to be fixed.

Actually Upgrade

Most mainline distributions have good maintainers that keep PHP up-to-date. These packages go through the same process for inclusion as any other package, so convince your systems or operations team to update. If they provide pushback, you will need to come up with a good business reason (the packages are official, safe, newer PHP has better security support, is faster, etc). Since they are official repositories, it is not that hard for them to get added into a system. It's not like the operations team needs to compile it themselves.

For Ubuntu/Debian there is the set of packages from Ondřej Surý, available at https://deb.sury.org/. He has worked for years to provide high-quality Debian packages, and all of the PHP packages are either directly from him or based on his packages.

On RHEL/CentOS/Fedora, you have packages from Remi Collet, available at https://rpms.remirepo.net/. He maintains packages for core PHP as well as a bunch of extensions, for various versions of PHP. As with Ondřej, Remi is the package maintainer for Fedora, so these are as official and safe as you are going to find for RPM-based systems.

Don't Delay, Start Now

I hope at this point I have convinced you why something as nice sounding as LTS releases are not as cozy and safe as they make themselves out to be. You are sacrificing time and money later for perceived stability today.

A project that stays up-to-date, and puts into processes that help update in real-time, will be able to stay competitive longer. If security is something that is an ingrained part of software development, why isn't upgrading? Like security, upgrading isn't something you bolt-on, or do later.

Stop making excuses, and start upgrading.

Posted on 2019-09-01

Comments