Finally came the day when I also have to submit a request for my kid to get a place in a kindergarten!
On 25th of March (on my son’s birthday), at 8 am, the registration for kindergarten should have started on vilnius.lt website. After waking up, the first thing i did was try to login. At 7:55 am, the website opened. “Hooray!” – I thought – “it didn’t crash this time!”
5 minutes later i understood, that nothing has changed…
Some of my thought about this, while i tried to sign up my kid for 5 hours….
This system has been working for a few years now, and every year there’s the same issue. They tried to get the registration system working this year in January, but the system couldn’t handle all the traffic. The registration date was changed to March 25th, with a few months of time to get the system fixed and optimised. Unfortunately, vilnius.lt IT department had failed to do this job. It was said that everything got fixed, but probably nobody did any testing or benchmarking to see if it really got fixed.
Some thoughts from my side on what could have been done better:
No CDN?
All the media files (images/css/js) are served from the same http://www.vilnius.lt website, without using any CDN. If a CDN was used or they at least had separate servers for serving static content, the situation would improve a lot, as the main http://www.vilnius.lt server wouldn’t be overloaded with these requests during the peak time.
There are free CDN’s, paid CDN, or you could turn on the CDN only during this event which generally takes place only a few days during the year.
AJAX is cool!
The kindergarten registration system uses a lot of ajax requests. But … are all of them really needed? For example – you are required to enter your personal identity code during the registration. When you fill it out, there’s an ajax request made to validate the personal identity number ( there’s an algorithm for that) , then there’s another request to get the birthdate out of the PIN (it’s part of the PIN, digits 2-7 ), then there’s another request for the same ( not sure why … ). This part can be easily done on JS side, as the algorith to validate this information isn’t very hard. And then make the server side validation during submit. But no, it’s better to make 3+ requests to overloaded servers just for this thing.
Lets not forget about all other fields! I.e. address. If I live on a street ‘Zaumenhofo’, and i start typing in ‘Zaumenhof’, then it sends an ajax request for every letter after the third one to get the list of streets based on the input field to try to autocomplete the street name. That is – ‘zau’, ‘zaum’, ‘zaume’, ‘zaumen’, ‘zaumenh’, ‘zaumenho’, ‘zaumehof’, ‘zaumenhofo’ – that’s 8 requests! Of course, then the system works properly, it should be enough to type in ‘zau’ and the system will give you a list of streets. But when the servers are overloaded, you’re not even getting the replies to the ajax requests, and keep on writing the street name – which generates even more requests. Also, you need to write the details for mom, dad and kid separately, which means for ‘zaumenhofo’ street, that’s 8 requests (for each letter ) * 3 = 24 additional requests.
Every other field works in the same way – an ajax request is sent for each dropdown or input field. There’s probably a couple hundred of them for the whole registration process – but i didn’t count.
You could just send these to separate servers which only validate this information so that you don’t overload the main servers that do eveeeerything else.
The winter came early this year!
Like every year so far, the winter came unexpectedly! Mostly during the second half of December, and the road cleaning guys are always not ready for it! Who could think of winter actually coming, in December.
We have the same situtaion with vilnius.lt kindergarten. HOW could they know that the registration will start on March 25th. IF they knew it would, they would get some additional virtual machines running to cope up with the load, RIGHT?
No…
In other words – they knew the date, they knew that the systems are always overloaded, but why can’t they just get ready for that? I understand that buying new servers just for a day or two is not very cost effective. But we live in 2015! There’s a lot of *be ready, a buzzword will be used in a minute* CLOUDS, where you can buy a server, or two, or even 100 for only a few days. And they are not expensive! You could buy 20 virtual machines in amazon, connect them to your app/backend, and pay something like 20-50 $ for those few days. The price is not big, but you wouldn’t get so shamed in front of whole country and world!
Maybe lets start sending emails with pidgins?
Not related to performance, but it’s hard not to see the note about trying to look in your spam folder if you haven’t received an email soon.
Lets check the inbox:
Return-Path: NO-REPLY@vilnius.lt
Received: from mail.vilnius.lt (mail.vilnius.lt. [195.182.82.88])
by mx.google.com with ESMTPS id ol7si1269421pbb.71.2015.03.25.01.57.07
for XXX
(version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
Wed, 25 Mar 2015 01:57:08 -0700 (PDT)
Received-SPF: none (google.com: NO-REPLY@vilnius.lt does not designate permitted sender hosts) client-ip=195.182.82.88;
Authentication-Results: mx.google.com;
spf=none (google.com: NO-REPLY@vilnius.lt does not designate permitted sender hosts) smtp.mail=NO-REPLY@vilnius.lt
Received: from HUB2.vilnius.vilnius.lt (10.13.3.4) by mail.vilnius.lt
(192.168.2.2) with Microsoft SMTP Server (TLS) id 14.2.347.0; Wed, 25 Mar
2015 10:53:24 +0200
Received: from w3r4.vilnius.lt (10.187.87.164) by HUB2.vilnius.vilnius.lt
(10.13.3.224) with Microsoft SMTP Server id 14.2.347.0; Wed, 25 Mar 2015
10:55:03 +0200
Date: Wed, 25 Mar 2015 10:55:57 +0200
To: XXX
From: no-reply@vilnius.lt
Subject: Prisijungimo duomenys
Message-ID: <e3cf4223db7449aa8e31488bfd238c9b@w3r4.vilnius.lt>
X-Priority: 3
X-Mailer: PHPMailer 5.2.9 (https://github.com/PHPMailer/PHPMailer/)
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="b1_e3cf4223db7449aa8e31488bfd238c9b"
Return-Path: no-reply@vilnius.lt
X-Originating-IP: [10.187.87.164]
First thing we notice is the strange email forwarding between servers (from server A 10.187.87.164, to server B 10.13.3.224 (and 192.168.2.2), to server C 10.3.3.4). Then we can see that SPF is not used (it says which IP can send emails from @vilnius.lt), nor DKIM (says that an email sent from @vilnius.lt is actually sent from that server), nord DMARC (which says what should be done with emails that are not DKIM/SPF valid). Maybe it’s time to start using an 10 year old technology (except DMARC, which is newer than 10 years) in the government sector aswell? These technologies are VERY expensive (Free), are VERY hard to set up (probably takes an hour to setup without prior knowledge about them) and need a lot of specialists time.
A slowdown in data processing, or an completely not working service?
Everyone on the radio keeps saying that the kindergarten registration works, but has a slowdown in data processing. That’s a very strange phrase. For me, “slowdown in data processing” is when the data is submitted without any problems, but it takes a longer time to process it once it’s submitted. Unfortunately, in this case, the situation is very different. Now, the lucky ones who are able to login, are able to submit the data. The unlucky ones can’t even log in, and can’t submit the data. The first ones are in one queue, the other ones are not in any different queue.
In example: if nginx is used, it has an option to always send users to the same backends based on a hash (i don’t know if that’s how it works on vilnius.lt, but this is just an example that “slowdown” != “not working). So, lets say we have 100 users with different IP’s and 3 servers. Then, 33% of users (A) go to server 1, 33% of users (B) go to server 2, and the other 33% of users (C) go to server 3. All is good, right?
No.
If 100% A users and 100% B users go to registration page – the registration won’t even open for them, because server 1 and server 2 will be overloaded. And if only 20% of users C go to registration, then they’re first in the queue (the server 3 won’t be as overloaded as server 1 and 2 ).
That’s why i think that it’s not a “slowdown” but rather a completely not working service, and it’s a matter of luck/algorithm wherever you’ll be sent to the server which is not overloaded.
EDIT: this post was translated to english, as my Lithuanian grammar is very poor and there were lots of negative comments about this.
man labiausiai įstrigo ši pastraipa: “Registracijos išvakarėse savivaldybė pranešė, kad sistemą tobulino ir jos patikimumą patvirtino informacinių technologijų kompanijos IBM, DPA ir BAIP”. Su pirmu esu susidūręs tiek pats, tiek ir tamsta (aix ir wesphere rulez), tad ko tikėtis jau buvo galima nujausti :-]
Na, visada yra tikimybe, kad nors ir uz didelius pinigus, bet bent jau rezultatas kazkiek apciuopiamas bus. Bet matyt ne si karta.
O Cloudflare visokie kam reikalingi? Jie net nemokami, jei SSL nenaudoji….
Taigi as jau paminejau “Yra ir nemokamu cdn’u, yra ir mokamu cdn’u, ir galima ijungti cdn’a tik tam tikram laikui kai yra didziausia apkrova.” – tiksliu provaideriu neminiu, a tai konkurencijos taryba dar pasirodys! 🙂
Beje, dadejau dar dvi pastraipas kurias pamirsau – apie laiskus ir leta apdorojima.