Jan 06 2016

This is what I do

Category: English posts,TechnicalIuliana @ 22:25

When everybody was going on vacation me and a few other colleagues, stayed behind in order to perform the migration from CVS to Git of our very large project. We used the wonderful cvs2git tool, although a lot of internet reports say that the results are unpredictable. The same thing I mentioned during the preparatory meetings, but for the first time since I work in this company apparently there were people that were more optimistic than me, because on the 23rd of December the migration began. A little bit earlier than everybody expected, but oh well…

iuliana-rambo

We had one big CVS repository, so the first step to do was to restructure our project and split it into little ones that could be easily migrated. Issue was, that one project could not be split. And that was the one that caused a lot of trouble. When I am writing this post, that project is still being migrated. And is migrated a little different than others. Each branch of the CVS repo, becomes a Git repo. Then all these repositories will be merged into one. And all my colleagues recommended me to use this and that, a lot of shell and git commands found on stackoverflow, I had the genius spark to merge these repositories in an instant using multiple remotes. I’ll write more about this in a later post.

Before the vacation started, I trained my colleagues in using Git. If you would ask me, the training was quite a fiasco, because I had only 2 hours per group to explain them what Git is, what are the differences between CVS and Git, how Git works internally, what GitBlit is, how to work with Git using Eclipse and his stupid EGit plugin and how to work around its mishaps. As you can imagine 2 hours were not enough, but it is what it is, I had to work with the resources I was allocated. Knowing exactly how the training went, I took advantage of the free days I had and I slept a lot and prepared myself mentally for 6 months of  answering repetitive, sometimes ridiculous Git questions. I mean, I am expecting for my colleagues to have the most weird questions and I am expecting for them to do the most weird things with Git.

And now, this is the first week. And my responsibilities do not cover only Git consulting, but my project manager is on vacation so I had to take his responsibilities as my own, I had to deliver a fix and I had to prepare the hotfix package for testing and delivery and also help people in the company to update their release/hotfixes scripts to use Git. Fortunately, the hotfix was ready, was tested and will be delivered at the end of the week.

But today a serious problem emerged. People were unable to work with the remote repositories. They got a lot of timeouts, and nobody knew the cause. Logs did not say anything related to that. So we started analyzing everything it could affect this.

We started with GitBlit, all looked fine in the GitBlit.properties file, all ssh properties were set with appropriate values.

Most of us were using the ssh protocol to communicate with the remotes, so we needed to check how many ssh connections the server could handle in parallel. SSH works over TCP, so the  number of TCP connections was just as relevant.

# cat /proc/sys/net/core/somaxconn
128

And it was a small damn one. It was increased to 1024. And it seemed to work for a while, but as soon as everybody started cloning, pulling and fetching, the problem reappeared. So this was clearly not it.

I then started to look at the SSHD server installed on the server. There were two parameters that interested me: MaxSessions(specifies the maximum number of open sessions permitted per network connection) and MaxStartups(specifies the maximum number of concurrent unauthenticated connections to the SSH daemon. Additional connections will be dropped until authentication succeeds or the LoginGraceTime expires for a connection) Both were commented in our /etc/ssh/sshd_config file, so I guess the default value of 10 was used for both of them. So both were set to 1024. (Yes I like this number)

I restarted the sshd service and again for a while everything looked fine. Then the timeouts started again. I started to think that maybe GitBlit did not close the conections successfully and that is why the 1024 quota was reached and timeouts happened. So I started looking at Gitblit again. After some research into each of its properties I found this one:

# The default size of the background execution thread pool in which miscellaneous tasks are handled.
# Default is 1.
execution.defaultThreadPoolSize = 1

And you probably suspect by now… it was modified to 1024. I restarted the Tomcat hosting the GitBlit installation and… voila. Remote operations are now working for my colleagues. Apparently remote operations using the ssh protocol are miscellaneous tasks.

I was doing all these things, while consulting people about Git and my close colleagues were amazed at how serene I was and how well I was handling it all. Actually I think I was a little amazed too, but then I realized that there is nothing to be amazed of. I was prepared for this, I was expecting a hell of confusion and people running around like Dexter(the cartoon character) when his hair was on fire. I was prepared because I am good at this job and because this is what I do.

Tags: , , , ,

Leave a Reply