Troy:

Black Friday or Cyber Monday or product launches or marketing releases or other events that you know are going to drive traffic to your site or application, but you're not sure you're going to be able to handle the load that's going to come in from that. What I hope you take away from this presentation is is that's not the only time to load test. If you're load testing one week or even two weeks before a major influx of users that you know is going to come, you're probably too late at that point. You don't know what type of problems you're going to find, and you don't know if you're going to have the ability or the time to fix them between when you do that load test and when your users are going to come in.

 

 

Another reason. Performance is a problem. I'm not going to go through each of these points here, but we all know that performance is a problem for applications on the internet. Surveys have shown that most organizations with internet facing applications, about 75% are actually constantly degraded, constantly in a state of degraded performance. Most of those problems are actually found by your end users, not you. That is obviously not an acceptable state to be in. Load testing can help there. Also, unit and functional tests don't find everything. Typically, when you think of load tests, you're looking at finding, "Can I handle a certain number of users?" But there's a whole series of bugs, whole categories of bugs that you can't find with unit and functional tests that load testing can help find.

 

 

Just to name a couple. Concurrently bugs. If you're not familiar with that, that would be like if you had a single piece of code that works fine in unit tests and looks fine on the screen, but if you were to run it multiple times at the same time could actually end up stepping on its own toes and having problems. These are things that you will not find with unit and functional tests, so load tests become necessary. Another category of bug is compositional bugs. Compositional bugs are slightly different but in the same realm, where you have multiple separate systems or pieces of code who all function in unit tests perfectly, but if you put them together in the right combination or the right ratios, all of a sudden cause issues.

 

 

I've put it as a separate line here, but one kind of compositional bug that's very common, so it deserves its own bullet point, is DB, database index and locks. That could be, for instance, if you have users registering and adding to cart at the same time, they may hit the same database table. If you have non-optimal table locks on that table, you could end up with those multiple queries hitting that table and getting locked out and eventually timing out before they're able to be fulfilled. Again, all of those would succeed in a unit and a functional test situation, but a load test would help you to find that.

 

 

Just breezing over the last two, configuration can also be an issue. You could end up with a situation where your web server had enough memory for the previous code release, but doesn't have enough memory allocation for tweaks that you made to your code just because a small tweak could have a huge impact on the amount of memory that a script takes. Then, the obvious ones, which are infrastructure limitations. This is what people typically think about with load tests. Do you have enough bandwidth available at every stage of your infrastructure? Do you have enough memory on your servers? Is the hard disk IO sufficient for what you're doing?

 

 

Load tests can help you find all of these things and more, and obviously this is more than most people think about when they think about load testing, rather than just finding the maximum number of users you could do. The next thing you want to think about when you're thinking about load testing is, can you answer these basic questions? These are basic questions that everybody should be able to answer about their application. If you can't, then load testing is a way that can help you to find the answer to these.

 

 

The questions we have are, do you know what the performance bottlenecks in your application are? Do you know if a certain number of users were to come in, where is the most likely place to fall over first in your application? Where are going to have problems first? Will your application perform in acceptable levels to your company at baseline usage levels and at peak levels? If you don't know that for sure, if you know what your peak levels are but don't know how your application's going to perform under that situation, load testing is an absolute necessity. Can you quickly pinpoint and fix problems and issues when they arise? Again, if you're load testing and you're doing it on a recurring basis and doing it correctly, you should know where the likely places that are going to break are. That should not be a scrambling exercise. You should already know what they are and have a team that has an understanding of what the steps are to fix that. Again, load testing on a regular basis will allow you to understand that and give you a fire drill so you'll be used to solving these problems.

 

 

Last but certainly not least, do you know your key performance metrics? Do you know what the number of users that you expect, the number of peak users, the weak points in your system, how to scale, how to fix issues, all of these things. Every application, every company is going to have a different set of metrics, but these metrics are key for measuring whether you're successfully weathering a peak in users or whether you're not, and how well you did. How you grade yourself is going to be based off these metrics, so picking the correct ones is key.

 

 

Okay. Now we've talked about why you might want to load test, let's talk about performing load tests. The first part of performing load tests is deciding what you want to do. The first thing you want to decide is what type of load test you want to do. Most of you on this call, if you're not very familiar with load testing have probably only dealt with one of these types before. There are actually more than this, but I wanted to focus on the three most common types of load tests. The first one being the one that everybody's heard of, is stress test. This is basically just throwing some very large number of users at your system and seeing if you can handle it. The purpose of a stress test is to find the absolute theoretical bottlenecks and breaking points for your application. The typical methodology is you just create some basic scenarios and run them as fast as you can against your system, and see what happens. It's a great test for just finding weak points in your application and tuning performance to some degree, but under most circumstances most people would consider this to be an unrealistic test, because the traffic that's being sent is not representative of what your users would typically do.

 

 

That brings us to the second type of load test, is a concurrency test. A concurrency test has a slightly different aim. It's basically to have more realistic traffic. What you're looking at there is you're going to look at the different scenarios, the different common scenarios that your users would do on your site and create separate scripts for each of those common usages. Then what you want to do is have realistic pauses and breaks between each run of those scenarios, and you want to balance the different scenarios.

 

 

Let's say you have a registration of a new user, you have an add to cart, and then you have a category search, and say these are the three most common use cases for your users. You're also going to want to figure out what the ratio of those actions are for each of those scenarios, and make a realistic mix of those when you're doing the load test. What that's going to do, it's going to give you a very realistic understanding at different usage levels so you can basically keep the same ratio, but ramp the number of users up and see what happens, and do a bunch of if-then's, so you can also change the ratios to do if-then's with different ratios and you can do if-then's with different numbers of concurrent users with the same ratios. This is going to give you a very good understanding of, under realistic conditions, what your application is going to perform like.

 

 

The final one I want to talk about today is disaster recovery. Disaster recovery is its own type of load test, and probably not a whole lot of people have heard of load testing in this style, but can be very, very helpful. This basically is just to create sustained load. Typically, it would be under the concurrency style with realistic traffic, but the idea here is to just have sustained load over a long period of time and then test out what happens under different failure conditions. This would be, for instance, if you have auto-scaling web servers you could take down several of your web servers and see how the remaining web servers handle the remaining traffic. If you have a full DR environment, you could do a DR cut-over and see how that works, where you'll see if the DR cuts over successfully. If you have session persistence between your primary and DR facilities, you could see if that works properly. If you have redundant database servers or a backup database server, you could take that down and see what happens in that situation. Very, very useful for finding out the resiliency of your application or your infrastructure, and understanding whether you can make it through large failures in your infrastructure that are meant to be redundant.

 

 

I also wanted to throw in just some honorable mentions, mainly because there's not a whole lot of difference between them and the concurrency test. But scaling tests are basically if you have any kind of auto-scaling infrastructure pieces inside of your application, scaling tests can allow you to ramp up the number of concurrent users and see how your scaling works. That can be useful for understanding what the cost is going to be associated with an auto-scaling environment, and how much it will cost you to handle certain numbers of users. Instability tests. Instability tests are ... think of it as a flavor of disaster recovery tests where basically you're just running a large number of users for a very long amount of time, and seeing if your application has any failures over time. This isn't necessarily trying to bring it down with a maximum number of users, you're just keeping on the high end of a realistic number of users and running it for a long enough time to see if there's going to be any time-based failures that may appear in your application.

 

 

The next thing you want to think about when you're assembling your load tests is, what do you want to target? There's obviously many, many things that you could target. I just put three examples on the screen here. Full site/application would be the most realistic, but the problem with doing a full site or an application script in a single load test script is that it can become very, very difficult to diagnose problems. It will tell you that you have problems, and if you're looking through your infrastructure logs you might be able to even find the location generically in your infrastructure, but it'll be difficult to diagnose what actually caused the problem. Not impossible, just much more difficult.

 

 

Single scenario is where you've basically got one kind of user flow. It's still got some steps to it, but you're really sticking to one flow. This would be, for instance, adding to cart or something if you had an e-commerce site. That is great for putting everything together because now you're seeing one flow, but you're seeing every step in that flow. It makes it easier to troubleshoot, but you're seeing just one specific user case. Not as realistic as the full site application scripts, but definitely gives you a lot more troubleshooting ability. Finally, isolated function, or API tests. This is useful especially if you're adding a new API endpoint or a brand new page to your website or a brand new section to your application, and you just want to isolate that out and see how it can handle load compared to other preexisting functions that you had already had. This can be very, very useful for seeing what something, especially something new that you're adding to your application or to your API, and seeing how it handles and whether it meets the same level of performance requirements that you've applied to the rest of your API's and functions in your application.

 

 

Finally, when you're putting together your script here, the most important thing to do is to understand the questions you want to answer. I've put some examples up here. Every application is different, so you really want to put some thought into what you want to answer. I've tried to put up here what I think are the most common types of questions. You might want to ask and answer with a load test. The first category here, what do you want to know? I think that's probably the most important to think about. When you're doing this load test, what is it you're expecting to find out? I think a lot of people when they think about load tests just think of it as applying the stress to the system and then you're going to look at your own system or your own logs to find the failure. They're thinking of the load test just as a tool for applying pressure, but you actually get an ocean of data on the load test side as well.

 

 

You want to understand, what are the questions that you want to answer that you can answer with the data that's coming back from the load test side, and what do you want to answer with your own logs and do you want to correlate that? Again, there's no right or wrong answer here. You just want to think about these questions while you're scripting. I'm not going to go through each of these questions, but you can see like, "Max concurrent users before failure" or, "Performance at expected peak," these kind of things.

 

 

What requests matter? When you're building a script to load test your system, you have a lot of freedom to decide what types of requests are made in that load test. Things that you want to consider when you're doing that, do you care about in this particular load test, about static content or CDN content? Technically speaking, if you include those in your load test, you're just load testing, if it's a CDN, probably a third-party provider. That's probably not going to give you information that's relevant to what you're trying to do, but maybe it is. Maybe you're trying to verify SLA's. Again, there is no right or wrong answer to any of these questions, you just need to think about it beforehand.

 

 

Do supporting requests matter? Again, the answer to this in a lot of cases will be yes. But an example of this would be if you have an auto-search field that searches for auto-population targets every time you type into a box. Do you care about the searches before they get to the final word that they're looking for, because the script has the ability to skip over all of that and just search for the final word. Do you care? In some situations you might care about that, and some situations you might not, so you just need to think about it from that point of view.

 

 

Another question you really want to think about is dynamic data. That really has to do with the setup for the test. This is not necessarily going to be a part of the script, but might need to be included as an external data source. Do you need to have unique usernames and passwords, or do you need to have unique form data? Do you have a bunch of products or skews that you might need to enter into a form? That might need to come from an external place. Do you need to set up your database so that it's ready for this load test? Does it have to have pre-populated data that you can search for? These are just questions you need to ask that prepare you for the test, because this is going to be important when you are setting up the script. You need to know what the script is going to expect, and what the script also needs to be able to perform correctly.

 

 

Then finally, but again, definitely not least, is what constitutes a failure? When you're talking about building a script, you want to understand what do you want to set the script up to understand as a failure in your infrastructure? Examples of this would be, did a request take too long? Did it not have the correct content in the response, or was the size of the response unusually large or unusually small? All of these things, again, every application is different, so what constitutes a failure is going to be unique to your application, but thinking about what these mean, what failure is in your application is important. What requests should you not even care about? In some cases, some third-party content or some requests, if they fail, your application will continue and you don't want to count that as a failure. Again, another thing that you want to think about.

 

 

Okay. Next, we're going to go into interpreting results, but I just wanted to touch on one thing very quickly. When you're running the load test, basically what we've gone through is what you need to do to build a script. The next thing is to run the load test. There's not much to that. You typically just put it into a SAS platform or into an onsite product to run the test. The one thing I will want to mention, the best practice for in terms of how much load you do, is what we call doubling. What that means is you pick a realistic number of users that's on the very low side of what you'd expect your infrastructure to be able to handle, and start with that.

 

 

Every application is going to be different, but let's say a hundred users, just for example. If that was your starting point, what you would do is run the test at a hundred users, then double it, run at 200 users, double again, run at 400 users, and continue on that way rather than just increasing linearly. That'll give you the maximum amount of data to work with, with the minimum amount of effort. What you'll end up finding is is eventually you're going to find your breaking point or your tip over point, and we'll talk about that a little bit in a second, but you can get more granular at that point. Let's say between ... you run an 800 user test and it runs perfectly, no issues whatsoever, and then you run a 16 hundred user test and you find issues, then you could get more granular between 800 and 16 hundred users and see what happens there.

 

 

What does it mean? There's a lot of information that you're going to get back from load curves. Again, we'd like to do followup webinars that go into more of the detail of some of the other data you get, but the most important thing that's going to come out of your load test is what we call load curves. Load curves are always ... what you're always looking at is on the X axis, the number of users. This is where we talk about, again, that doubling principle. This would be the number of users on the X axis, and various metrics on the Y axis. I've put the three that I consider the most important on the screen, which would be session duration, throughput, which is network throughput, and failures and errors in your application. What we're looking at on the screen is what it would look like with an ideal or perfect infrastructure or perfect application.

 

 

If you see for session duration, the session duration per user would stay flat because your application or your infrastructure would handle every request identically, and would never increase in time no matter how many users you threw at it. The network throughput would go up linearly, but would increase, again with no curve, perfectly linearly because as you add users you would expect the amount of network traffic to increase in perfect relation with the number of users you're throwing at it. Of course, if your application is working perfectly you'd have zero errors, so you'd have a flat line at zero on that graph. But of course, nobody's infrastructure's going to look like this when load tested, so let's see what happens under real conditions.

 

 

These are more realistic graphs. This is what you would expect to see if you had picked an appropriate range of users for your load test. What are we looking at here? Under the session duration, you'll see that it stays flat and then eventually it starts to go up a bit, and then it peaks and it starts to go down. Why does it do that? Well, I'm going to skip over the middle graph and look at the last graph for a second. If you noticed, the number of errors will typically go up, and that's usually showing that your application can no longer sufficiently handle the requests, so you're going to start to see failures at some point and you're going to get error messages going back. Typically, if you get an error, it's going to be faster than what the normal session would be. You'll see the average session times start to go down when you see those failures come in.

 

 

We're turning to the middle graph. As the errors come in, error messages typically take less network traffic than actual responses. You'll start to see the network throughput flatten out as you transition from valid requests to error requests. What we're looking at here, I put color codes onto the graph to show you what the different areas mean. What we have is, the blue is the safe area. You can see that if you were to cut out everything else and just look at the blue, that would be the same as the ideal curves, and that's where your application is acting perfectly and has no problems at all. But more interestingly, the yellow zone is what I would call the peak zone. Depending on your tolerance for errors, this can be still a valid area for your application, but you are going to start to see slowdowns and/or failures at this point.

 

 

If you're looking at these load curves for your application, this would be what would indicate that you have hit your peak usage, and this is the maximum number of users that your application can successfully handle. Finally, the red zone is the unstable zone, especially if you're looking at the session duration and you see a downturn in the session duration, that's a very, very good indicator that you are now into a zone where a large portion of your users are seeing errors, and it's probably into an unacceptable zone for your company. But again, it depends on what your tolerance for errors and your user space is.

 

 

When to load test. I don't have a whole lot of time left here it looks like, so I'm going to speed through a little bit of this. The correct answer to this is whenever possible. When I say whenever possible, I like to think of it in terms of your environments, or some people like to think of it in terms of deployment stages. These are the kind of common deployment stages that you'll see in a lot of companies. Yours may be slightly different or you may have different names, but I think that in general most people have some or all of these in their deployment flow and as stages in moving their code from development into production.

 

 

When we're talking about this, I like to suggest that load testing happen at some level in all of these. In Dev environments you can basically load test against Dev environments to find those concurrency and composition errors that we talked about at the beginning very quickly without having to move onto the next level. You can find these bugs quickly and keep them from going further into the pipeline and save your QA engineers from having to spend energy on something that can be solved earlier in the pipeline. Staging and QA environments tend to have more resources than Dev environments, so it's a good place to start running a slightly larger load test and catch problems also from ... if you were doing a test in Dev environments, it may have been against a code that was submitted by one coder or one developer. Staging environments tend to go in rolled up commits, so this is a good place to catch bugs that might come from ... load tests based errors that might show up from code that comes together from multiple development groups into one commit. But again, we don't expect that these are going to have the robustness of a production environment, so a large load test probably doesn't make sense here.

 

 

Pre-production is the perfect place to run scale load tests. It should, in an ideal world, your pre-production environment should be as close to your production environment as possible in terms of the infrastructure, so it's going to give you the most realistic and the most meaningful results for a large-scale load test in terms of whether or not you're going to be able to handle a particular load that you're looking to target. I always recommend doing the vast majority of your at-scale load tests in pre-production. I do not recommend production load tests for that. But there are certain cases where production load tests can make sense.

 

 

The most common scenarios I see for that are underneath there. It's basically for if you have done your tests against pre-production but you want absolute assurity that you can handle a certain number of users, and you have a massive expected spike of users coming up, then a final at-scale load test in production maybe in off hours can make sense in that situation. Finally, you can use it as a validation, usually for legal requirements to fulfill DR tests for legal requirement or to test your seamless code deployment to make sure that if you're implementing a new code deployment strategy and you want to make sure that's worked the way you thought it would, and finally for performance validation, if you have contractual SLA's or something like that that you need to show in production. That would be the only reasons I would see for running at-scale in production, unless you just cannot for other reasons have a pre-production environment.

 

 

For this webinar, I think we'll stop right there. Hopefully this was informative, but if you have any questions please put them in the chat box.

 

Whitney D.:

Awesome. Thanks so much for that presentation Troy. I think I speak on behalf of everybody here that that was very insightful and we're really excited to dig deeper into load testing and the further webinars going forward. Stay tuned everyone, we will be having hopefully a webinar a month. You can find them on Twitter, LinkedIn, as well as e-mail. Really quickly let's get through a couple of questions. Oh, there's a lot.

 

 

[Jared 25:44] asks, "Is there a way to be notified if an error occurs in our system? It's great to know we're having performance issues, but if we don't know them in realtime, it can cause issues. Do you know of a company that sets up alerts?"

 

Troy:

Great question. Typically, what you're talking about there is more on the synthetic monitoring side. Load testing is typically for proactively looking for errors, and it sounds like from your question you're looking for the ability to be alerted if something goes down in your production system. We do have a product for that. It's a synthetic monitoring solution, which we will ... I would be happy to cover in a further webinar. Yeah.

 

Whitney D.:

Then we've got question from Scott who asks, "How do I know how long to run a load test?"

 

Troy:

Okay. That's a great question. Just like any of these, there's no right or wrong answer to this, but the best practices for how long to run a load test depends on the type of load test you're running, if you look back at the three types of load tests that we had, and your goals. Typically, what we want to do under most circumstances is to run it at least long enough for your environment to stabilize, and then some buffer time after it. For most applications, 30 minutes would be plenty for that style, for anything other than the stability test. But you can go as long as an hour, two hours for some applications that take a long time to stabilize.

 

Whitney D.:

Great. I have another question that asks ... [Parthiban 27:24], sorry if I've mispronounced your name, asks, "What do you monitor while execution?"

 

Troy:

Our particular load test tool pretty much captures anything that goes over the network. Every tool's going to be different, but in our case we capture everything that goes over the network from the load generator point of view. Every request we make, we record the request we make, the headers that we sent out and all of that. We also record everything about the response that comes back, including all of the timing, so the connection time, the DNS time, the wait time and the response time. All of the information you'd expect from a network timing perspective and a content perspective is recorded in our tool. Any good load testing tool will do the same. How it's presented may be aggregated if you're using a SAS portal, but the data should be collected at least at the beginning.

 

Whitney D.:

Awesome. We definitely don't have time for any more questions, but again, feel free to e-mail me, whitney.donaldson@apicasystem.com, and I will get you those answers by the end of the day. Thank you so much for joining us today. Thank you Troy for speaking with us, and we'll see you guys next time.

 

Troy:

Thank you everybody.