­čÜÇ Sie sind Freiberufler:in? Jetzt kostenlos Zeit sparen.

ZEIT-IO Refactorings 2022 (I'm sorry)

Robert Reiz Robert Reiz / 16. Januar 2023 / 10:27 UTC
ZEIT.IO is a platform where you can manage your experts (freelance & employees) in one tool. It includes contract management, project budget tracking, time tracking, invoicing and much more. 

The future roadmap and the current features are strongly driven by customers. In the past 2 years we developed fast to satisfy customer requests! And you know how it is. If developers are constantly under time pressure, they add new features to the current solution, even if it might not perfectly fit into the overall concept. That might work for some time, but it adds to the technical debt. And then it comes the time, where the technical debt needs to be cleaned up. That is usually a several weeks refactoring sprint. For some reason I thought, the time between the years might be a good time for this big refactoring. Here are the things which have been refactored in the last couple of weeks: 

  • New data model. Adjustments to the current needs. Speed improvements. 
  • Migrating from Rails 6 to Rails 7  
  • Upgrading the Docker base image 

Improving the architecture of existing systems


New Data Model 

The data model before Christmas 2022 was OK, but not great. The collections in MongoDB have been growing over time and for many views several queries have been needed. Most of the organisation pages had load times from one second and beyond. Some even several seconds. That's not a good user experience. If a web application becomes slow, then usually it's not the fault of the underlying programming language. Most of the time, the reason is too many DB queries or too slow DB queries. And that was the case here, too! 

That's why I refactored the data models and optimised the DB queries to fit the current views. That was a pretty heavy refactoring because not only models, services and some controllers had to be adjusted, but existing data needed to be migrated to the new structure. And many automated test cases needed to be rewritten as well. 

After all, it was worth it! Now, the pages are loading again under one second. Most of them load even under 500 ms. The application feels fast again! 

Migrating from Rails 6 to Rails 7

ZEIT.IO is build with the Ruby on Rails Framework and StimulusJS. Which is, in my humble opinion, still the best way to build modern web applications. Rails 6 used Yarn to manage JavaScript dependencies and Webpacker to build and compress JS files. Yarn was somehow OKish, but I never liked Webpacker.

Rails 7 is great, because it removes ALL dependencies for JS build tools on the server side at runtime! Rails 7 introduces JavaScript Import Maps. With that it is possible to "pin" JavaScript dependencies in an "importmap.rb" file and the browser resolves and downloads them directly from a CDN. Not the server! That works with NPM packages, too. That's pretty awesome because that means, that a JavaScript Runtime is no longer needed on the Server side anymore! That means smaller Docker images, faster deployments and less security risks. But it also means, a pretty big refactoring, because the way you handled JS dependencies until now doesn't work anymore. JS files have to be moved in other locations and the import and initialising mechanism works a bit different than in Rails 6. There are many posts about this topic at StackOverflow.

Rails 7 introduces many more cool features, the full change log can be found here. But Import-Maps is probably the biggest and most relevant change. 

Upgrading the Docker image 

All services at ZEIT.IO are running in Docker containers. During the deployment process (GitHub Action), the code is bundled into a Docker image, that image is pushed to AWS ECR and then rolled out on an ECS cluster. To speed up the deployment process, there is a custom base image, which includes most of the dependencies already. That way, we don't need to download and install ALL dependencies on each deployment over and over again.

The Docker base image before Christmas 2022 was based on Alpine Linux + a Ruby runtime + a NodeJS runtime + some additional dependencies and was 343.21 MB in size. The current base image is adjusted to Rails 7, Ruby 3.1 and without any NodeJS runtime and is only 269.77 MB in size. The finished Docker image which finally ends up on the ECS cluster is round about 100 MB smaller now, compared to the version before Christmas 2022. That's because the NodeJS runtime is no longer need on the server side at runtime. The current Docker image which runs ZEIT.IO has 397.55 MB in size and according to the ECR security scanner it has 0 known security vulnerabilities. 

Before Christmas 2022 a deployment to production took 7 to 9 minutes. Now a typical deployment takes 5 to 6 minutes. That includes:
 
  • minifying JS & CSS assets 
  • publishing assets (JS, CSS, images) to a CDN 
  • building a Docker image
  • pushing the image to ECR
  • triggering the ECS deployment 
  • waiting until all new containers are up and running and healthy

The last point, takes round about half of the overall deployment time. 
With this refactoring, the deployments are running ~2 minutes faster now, in average. 

What went wrong? 

448 files have been touched for all this changes. That are a lot of sources for errors. Of course there are automated test cases. Right now there are 1833 automated test cases and the code has 80.93% line coverage. That helps a lot, but anyway, there is no guarantee for catching all errors. And so the Zendesk-Tickets and emails started to come in, after rolling out all this changes on production.

Some errors occurred because some data have not been migrated correctly. At first it looked like some data have been lost. But luckily the old data and data structure was still there and so it could be fixed quickly within one day.

Many errors occurred because some JavaScript libraries have not been initialised correctly. That was mainly because of the migration to Import-Maps. For example, there is a group-by feature, where organisations can filter all recorded times and group them by customer, project or user. The grouped data is then visualised in a bar chart. Here is an example with test data.

Recorded times grouped by customer


That bar chart just didn't show up. Customers who used the grouping feature just got a white page as response and thought that the grouping is not working anymore. The grouping itself was still working, just the visualisation didn't work because the JavaScript chart library was not correctly initialised. 

And of course there have been some null-pointer exceptions, because the test cases did not cover alle cases. But now the null-pointer exceptions which occurred have been fixed and covered with new automated test cases. 

What could have been done better? 

The chosen time for this big refactoring was a bit unlucky. And I should have known it better. At LivePerson we always had code and deployment freeze during Christmas and new year. For a reason! The start of each month is always critical because approvers are approving timesheets at ZEIT.IO and customers are using the invoicing module to create their invoices for the past month. A date somewhere in the middle of January would have been a wiser choice for this big rollout. And probably the month February or March would have been much better too, then December/January. And I should have informed my biggest customers that a big change is coming. I'm sorry!! And I promise I will do better in future!