Scalability is always with us!
March 18th, 2010
0Since a couple of years ago, scalability has been my main matter of study -after data crunching. I do not stop investigating, doing and undoing. And I mention undoing, because one of the most important things I have learned of scalability is that you do and undo most of the time.
Once I was reading a document that assured that changing MySQL configuration to specific parameters, the concurrency of the database would rise drastically. As we did not have other possibilities, we tested it. As you can imagine, things always go smooth until you crash against a wall. When we activated this configuration after testing it, our servers started to collapse one after another. Happy for us, our monitoring scripts warned us early enough to not affect users.
From that moment I learned not to give for granted all the scalability advice that was on the Internet. You need to be really sure this person has had enough experience to start talking about it.
Another anecdote I can say happened on the early stages of Tuenti.com, the social network my friends and I founded in Spain. We were doing some data schema changes, and we though we could do those changes later in the production environment without any problem. Since the documentation of MySQL assured that in a Master-Slave environment, doing ALTER over the tables should not affect to the performance of the system. This is because the ALTER is done over a temporal table and then the tables are switched. So reads could still reach the table.
You can Imagine the result of this because of how I am starting this paragraph. Yes, disaster. The master started to stop the writes -yeah, awesome, uh? how did we not think on the writes… This made that PHP connections were retained and therefore, the file descriptors on the front end servers run out pretty fast.This spiral of events would have made the website crash we needed to cancel the operation before it could happen.
“Only” noticeable repercussion for the several thousands of users were at the moment on the website, and they were not affected, except their messages were taking longer to arrive to their destinations.
So with all the experience life has imparted on me, with so many failures I can recognize wide open, I made a list of the most important things a team must have in mind if they are going to launch a really big project to the Internet.
• Things must be really simple: complex is almost always not scalable, and less those things we need to do even more complex to scale them. The worst enemy of scalability is complexity.
• Research: When you are facing a scalability issue, probably, if you are not one of the major portals of the world, somebody else had probably gone through it and also documented it somewhere in the Internet. We solved a network problem by our own, spending almost one week finding the right solution with trial error methodology, and in the end, it took us 20 minutes to find the same solution in a tech blog about networking issues. We lost a week because we didn’t realize that somebody had gone through this issue too.
• Prioritization: Even if it is an obvious thing, we tend to give for granted the priorities. One of my University professors once told me: “write down in a paper your life priorities, and then order them as you want them to be.” I did it, and I found that I was having my friends under my work, and I changed them because I really want my friends more than my work. There is sometimes we don’t see the global scope of the problem and we tend to mistake when prioritizing. Sit down, write on a paper the overall problem, and prioritize then based on that, you will find that sometimes is worth to have the bottleneck for some more time because another foreseen problem might be worst if not fixed now!
• Bottlenecks iteration: This is based on the Sysadmins world. But you can apply it to (almost) everything related to scaling issues. In the product we will be going through every type of bottleneck, and some we will be able to foresee, but others we won’t. We need to be able to respond to them and wait for the next one to come. We should not expect our product is perfect already for the scalability.
• Success as a Team: Having your responsibilities in mind, assuming and executing them is vital so other members of your team can rely on you. This makes even more possible to face the scalability problems faster and easier. If everybody drops into the problem even though it is not their specialty, then the problems will double: first assessing who is right and then fix the bottleneck problem. Instead, on a well-balanced and organized team, the only thing would be to solve the problem.
• Know your platform: Not anybody realize how complex the platform can be. Make sure there is people that understand the complexity of the whole platform. Not everybody has to have the big picture of it, but at least somebody have to be able to understand all the connections between the pieces, to diagnose more global problems. This resides in the fact that in certain moments, there will be bottlenecks that will cause multiple symptoms, and can only be diagnosed on a global perspective. And vice-versa, maybe the bottleneck resides on a place that is completely different where the symptom appears!
This list of items are based on the documentation I have read on the Internet plus my experience on the field. There is more to see in these places: YouTube related blogs
Tagged with: scalability
Related Posts
Author
0 Comments Leave a comment
Leave a comment
Allowed Tags
_emphasis_
*strong*
??citation??
-deleted text-
+inserted text+
^superscript^
~subscript~
@code@
Add code using a GIST
gist: gistid
Your comment preview
Reply to comment