Do you Know about?

Twitter crashes explained by their engineers.

Twitter explaines the causes of — and remedies for — its multiple and massive failures over the past week.

The high number of errors and generally poor performance , this summer’s problem has been one of scale: Twitter is growing so much and so quickly that the engineering team has been challenged when trying to keep up with the sheer volume of data going through the service’s internal network.

What happened that caused this week’s Twitter issues, wrote engineer Jean-Paul Cozzatti, is that the engineering team made three critical mistakes:

  • The team put two important, fast-growing, high-bandwith components on the same segment of Twitter’s internal network.
  • The network wasn’t being monitored the way it should have been.
  • The internal network was also temporarily misconfigured.

To ensure the same mistakes aren’t repeated, Cozzatti continued to outline what Twitter will be doing to fix the problem. He wrote that the company has doubled the capacity of its internal network, improved how it’s monitored and rebalanced its traffic.

“For much of 2009,” he wrote, “Twitter’s biggest challenge was coping with our unprecedented growth (a challenge we happily still face)… But as this week’s issues show, there is always room for improvement.

“Based on our experiences this week, we’re working with our hosting partner to deliver improvements on all three fronts. By bringing the monitoring of our internal network in line with the rest of the systems at Twitter, we’ll be able to grow our capacity well ahead of user growth.”

Src & Text: [mashable]

No comments: