Scalability is always with us!

March 18th, 2010

0

Since a couple of years ago, scalability has been my main matter of study -after data crunching. I do not stop investigating, doing and undoing. And I mention undoing, because one of the most important things I have learned of scalability is that you do and undo most of the time.

Once I was reading a document that assured that changing MySQL configuration to specific parameters, the concurrency of the database would rise drastically. As we did not have other possibilities, we tested it. As you can imagine, things always go smooth until you crash against a wall. When we activated this configuration after testing it, our servers started to collapse one after another. Happy for us, our monitoring scripts warned us early enough to not affect users.

From that moment I learned not to give for granted all the scalability advice that was on the Internet. You need to be really sure this person has had enough experience to start talking about it.

Another anecdote I can say happened on the early stages of Tuenti.com, the social network my friends and I founded in Spain. We were doing some data schema changes, and we though we could do those changes later in the production environment without any problem. Since the documentation of MySQL assured that in a Master-Slave environment, doing ALTER over the tables should not affect to the performance of the system. This is because the ALTER is done over a temporal table and then the tables are switched. So reads could still reach the table.

You can Imagine the result of this because of how I am starting this paragraph. Yes, disaster. The master started to stop the writes -yeah, awesome, uh? how did we not think on the writes… This made that PHP connections were retained and therefore, the file descriptors on the front end servers run out pretty fast.This spiral of events would have made the website crash we needed to cancel the operation before it could happen.

“Only” noticeable repercussion for the several thousands of users were at the moment on the website, and they were not affected, except their messages were taking longer to arrive to their destinations.

So with all the experience life has imparted on me, with so many failures I can recognize wide open, I made a list of the most important things a team must have in mind if they are going to launch a really big project to the Internet.

• Things must be really simple: complex is almost always not scalable, and less those things we need to do even more complex to scale them. The worst enemy of scalability is complexity.

Research: When you are facing a scalability issue, probably, if you are not one of the major portals of the world, somebody else had probably gone through it and also documented it somewhere in the Internet. We solved a network problem by our own, spending almost one week finding the right solution with trial error methodology, and in the end, it took us 20 minutes to find the same solution in a tech blog about networking issues. We lost a week because we didn’t realize that somebody had gone through this issue too.

Prioritization: Even if it is an obvious thing, we tend to give for granted the priorities. One of my University professors once told me: “write down in a paper your life priorities, and then order them as you want them to be.” I did it, and I found that I was having my friends under my work, and I changed them because I really want my friends more than my work. There is sometimes we don’t see the global scope of the problem and we tend to mistake when prioritizing. Sit down, write on a paper the overall problem, and prioritize then based on that, you will find that sometimes is worth to have the bottleneck for some more time because another foreseen problem might be worst if not fixed now!

Bottlenecks iteration: This is based on the Sysadmins world. But you can apply it to (almost) everything related to scaling issues. In the product we will be going through every type of bottleneck, and some we will be able to foresee, but others we won’t. We need to be able to respond to them and wait for the next one to come. We should not expect our product is perfect already for the scalability.

Success as a Team: Having your responsibilities in mind, assuming and executing them is vital so other members of your team can rely on you. This makes even more possible to face the scalability problems faster and easier. If everybody drops into the problem even though it is not their specialty, then the problems will double: first assessing who is right and then fix the bottleneck problem. Instead, on a well-balanced and organized team, the only thing would be to solve the problem.

Know your platform: Not anybody realize how complex the platform can be. Make sure there is people that understand the complexity of the whole platform. Not everybody has to have the big picture of it, but at least somebody have to be able to understand all the connections between the pieces, to diagnose more global problems. This resides in the fact that in certain moments, there will be bottlenecks that will cause multiple symptoms, and can only be diagnosed on a global perspective. And vice-versa, maybe the bottleneck resides on a place that is completely different where the symptom appears!

This list of items are based on the documentation I have read on the Internet plus my experience on the field. There is more to see in these places: YouTube related blogs

Tagged with: scalability

Related Posts

Author

Joaquin Ayuso de Paul

Small

Joaquin is the VP of Data Products. He founded one of Spain’s largest social networks before joining the Border Stylo team. When he’s not at the office, you can usually find him showing off his bowling skills.

Tags

API Aardvark Athletes AutoCAD AutoLISP Avinash Kaushik Barrelfish Box Shadows CSS3 Calculus Careers Catalysts Community Community Conferences/Conventions Conferences/Conventions Cross Browser Culture Degrading Digital Footprints Evernote Front End Development Gaming Geek Culture Glass Gradients HR HTML Haskell Holidays IPv4 IPv6 IgniteLA Ignorance Innovative Interactions Kanban Knowledge LEGO Lomography Los Angeles Martha Stewart Movies Multikernel Music NBA Photoshop QA Resolutions Rounded Corners SGML Scheme Scriptability Social Fresh Software Development Sports Stereomood Swag Unix Videos Web Standards World Cup 2010 advice agile ajax apps beta beta testing beta versions bloggers brands browser cache caching call/cc challenges china chrome cold call comet communication community management company pages computation connectivity continuations control-structures copyleft copyright coroutines creative workspaces creativity critiques css cucumber cursors customer service customer support data products design designers dynamic code economy entrepreneur entrepreneurs exceptions extension facebook feed firefox franken post gadgets generators google greasemonkey grid system http humanization influencers innovation intellectual property internet iphone jQuery javascript job search job-hunting jobs lambda lamp marketing markov chain martinis monetization strategies mottos mst3k networking new technology open source software partner passion patent phone plugin privacy productivity products programming languages protocol pure-function quality assurance readability remote pair programming resumes tips rspec ruby ruby on rails scalability screencast security servers social media software engineering sponsors start-ups state syntax taxes team members terminology test threads tips tools turing machine type theory types typography unicycling user experience user stories vidcon web development webspider xbl youtube zappos

0 Comments Leave a comment

Leave a comment

Anonymous
Right now

Your comment preview

Reply to comment





Incorrect please try again
Enter the words above: Enter the numbers you hear:
If you are not able to read this, you can get another image or hear it
Want to see an image again?

Allowed Tags

_emphasis_
*strong*
??citation??
-deleted text-
+inserted text+
^superscript^
~subscript~
@code@

Add code using a GIST
gist: gistid