Collecting performance counter information with the Visual Studio Team System profiler
If you have ever done performance work on Windows systems you are probably already familiar with the PerfMon tool. This tool gives you an overview of your system performance and can be an invaluable tool in the early (and sometimes the late) stages of a performance investigation. If you look at either of my two PerfMon links you can see that PerfMon is usually used to pick out what the performance bottleneck is when examining an application. By “performance bottleneck” I mean picking out the resource that is being saturated by your program (CPU, network, disk). PerfMon does this by tracking some subset of performance counters that monitor different aspects of system performace. This is important as you often need to use different techniques and tools for investigating different bottlenecks. For example, using the Visual Studio profiler it’s usually best to use sampling mode to investigate CPU issues and instrumentation mode to investigate memory and disk issues.
With the Visual Studio Profiler we wanted to give customers an easy and integrated way to collect this performance counter information and view it alongside their performance data. This was especially important to us as with this information we could help customers to analyze specific trouble areas of their program or to choose the correct profiling modes based on their performance bottleneck.
Turning on performance counter collection
To access this performance counter collection start out by going to the session properties of your performance session. In the session properties you will want to select the new “Windows Counters” options, shown in the screenshot below.
On this page you can check the box in the top of the screen labeled “Collect Windows Counters” to add windows performance counter collection to your next profiling run. Below that checkbox you see a box to input a value for how often the counters are collected labeled “Collection interval (msecs).” But before we dig into the collection interval it is important to understand just how these counters are collected. With our profiler we have the concept of “marks” that show up in the data collection stream with a comment associated with them. Unlike all other profiling data these marks are not aggregated and show up in chronological order in the “Marks” view of the performance report. These marks are usually added by annotating your code with the data collection APIs (native API here). When we tell the profiler to collect Windows performance counters they get collected every time one of these marks gets hit. The collection interval control allows you to automatically insert marks in your program while running at some specific time interval. That way you can collect performance counter information easily without all the hassle of adding marks to your program manually.
In addition to setting the collection interval you also need to control what performance counters are being collected. In the lower section of the property page are all the different performance counters that you can selected to be collected. We surface all the same counters as PerfMon, so if you have a favorite counter that you like to track you can be sure that we’ll have it. By default we’ve included basic processor, memory and disk usage counters. We picked these as they give a good basic overview as to how your application is taxing the system.
Viewing performance counter data
So to demonstrate how these marks and associated counter values are actually surfaced in the report file we need to run the profiler and collect our data. For this example I’m using the PeopleTrax sample application and running it in sampling mode with the default set of performance counters and the default 500 ms marking interval. For my profiling scenario I’m just going to wait for the application to initialize, hit “get people” button to load the records from the database and then shut down the application. After the application is shut down the performance report will load automatically and then I will jump to the “Marks” view.
In the view above you can see the automatically collected marks. And for each mark you can see the memory, disk and processor usage at that point in time. Now this data can already tell us much about our program. For example we can see that we started out with a brief processor spike between marks three and five closely followed by a memory usage spike between marks five and eight. But how can we actually tie these values into all the rest of the performance data that we collected? After all these are timeline values while the rest of the data in the report is aggregate data from the whole run. Luckily we’ve provided a new filtering feature to help use the performance counter data to guide your performance investigations.
Filtering performance data from the marks view
Looking further into the marks data there was another big memory spike later in my program between marks 129 and 133. But how can we tell what was actually happening in the program during that time? To start the investigation ctrl-click marks 129 and 133 in the marks view so that they are both selected. Then right-click on one of them and select the “Add Filter on Marks” command. When you do this a new filter dialog will appear docked on the top of your performance report.
The filter control has a lot of depth to it, but for now you can just understand that is it currently saying “only show the data collected between mark 129 and mark 133.” To actually apply the filter to the performance report click the “execute filter” button in the toolbar (it’s the button with the green play icon). You will see a progress bar for the reanalysis that is being performed and then the report will pop back up. Only now all the aggregate data for the report is only the data that we captured between those two marks! With just a quick glance at the summary page below you can tell exactly what our application was doing and why we were seeing the memory usage spike at that point (big blue circle added by me for emphasis ;-) ). However it is important to keep in mind that you will need to clear the filter items and rerun the empty filter or to close and reopen the report to get back to seeing all of your performance data.
In addition to filtering on marks the filter grid can so some other very neat stuff such as filtering on threads, by timestamps or by time percentage (to do something like “show only the first 10%” of my run). I don’t have room to cover all the other filtering types here though, so I’ll have to come back and hit them in a later article.
Collecting performance counters is a very cool new feature that we’ve added to the profiler for Orcas. Even before we get into how cool it is to be able to filter down your profiling results based on performance counters it’s nice to just be able to collect and store those values side by side with your profiling data. And when you throw in the filtering features it quickly becomes an invaluable tool for performance analysis.
Analyzing a performance report
When you first open a performance report in Visual Studio 90% of the time the question on your mind is “where is my program spending most of its time?” In Visual Studio 2005 the best way to find this was to move to the calltree view and from there to follow the execution path with high inclusive functions down the tree. If you are not familiar with the calltree view, this view places all the execution paths of your program into one big tree structure. In this tree you can see both inclusive and exclusive values for either samples (in sampling mode) or time (in instrumentation mode). In a nutshell, inclusive means “functions that caused work” so any inclusive time or samples for a function can be attributed to that function or any of its sub-functions. On the other hand, exclusive values can be attributed only to that function and do not include any values from sub-functions.
So with the above explanation the way to find out where your program is spending most of its time is to drill down the tree and whenever you have a choice between children of a function pick the child with the highest inclusive value (meaning that it is doing more work below it then the other children). Eventually you will reach some functions with high exclusive values, which are the functions that are actually doing the most work in your program. At that point you can examine those functions and the path taken to them and start working to fix your performance issue.
The problem with the above scenario is that, quite frankly, it’s a pain in the butt to have to always drill down on the tree to find these hot functions and hot paths. Especially with sampling reports the calltree can be many levels deep and it can take a while to get to the functions of interest. So for Visual Studio 2008 we’ve added a small, but very useful feature called “Hotpath” to automate this scenario. Now when you open a performance report and move to the calltree view you will see a small flame icon located in the toolbar. When you push this button it will expand the hot path from the root following the rules that we mentioned above (following the high inclusive numbers when choosing between children). In the example seen below I’ve run a profiling session on a managed application that calculates and displays a Mandelbrot set. When running Hotpath on the report it jumps me down the tree directly to the DrawMandel and SetPixel functions that are performing the bulk of the work for this scenario. The Hotpath will stop whenever it reaches a function with high exclusive values or when it hits two children with similar inclusive values. Also, it can be run from any point in the tree (just right click on the function) so you don’t have to always run it from the root function.
As you can see, the hotpath feature can save you a lot of time in performance analysis. Already folks who use the profiler internally here at Microsoft have become big fans of this feature. In fact, there has been spontaneous applause at meetings when we demoed the feature for the first time! As of right now it’s the closest that we have to a “show me my performance issue button.” And it’s one of the many ways that we are trying to help users to better understand and take action on their performance information in Visual Studio 2008.
The new Developer menu in Visual Studio Team System
In my recent series of blog posts I’ve been covering some of the cool new features that the Visual Studio Profiler team has been adding to the code profiler that is included with Visual Studio Team System for Developers (and Team Suite edition). With all the features that have required hours and hours of work from us we’ve also made a small change that we think will have a big impact on how customers find the profiler. This change was creation of the new top-level “Developer” menu and moving the profiler and code analysis commands to this new menu.
It was scary for us to find out how many customers were unaware that a powerful profiler shipped with VSTS. Some were even purchasing competing profilers without first checking out the one that they already had! So to make the profiler more discoverable we moved it out from where it was buried in the Tools->Performance Tools menu to the new Developer menu. Now the name of this Developer menu might change, but there will assuredly be a top level menu with both your profiling and code analysis tools available in it.
The Visual Studio Profiler Data Collection Control Part 1: Excluding Application Startup Time
Even if you are familiar with Visual Studio Team System you may not already know that a powerful profiler is included with both the Developer and Team Suite editions of that product. If you are not familiar with the profiler you can check out some of my earlier blog entries or TechNotes. In my new series of blogs posts, starting with my first post about the comparison documents, I’m coving all the cool new features added to the profiler that customers will finally be able to see in the upcoming Orcas Beta releases.
The Data Collection Control
In this article I’ll be focusing in on the data collection control. This control allows users several new ways to control and annotate your profiling files from the IDE while the profiler is running. Previously we have offered most of this control already, but only from the VSPerfCmd command line tool. The Data Collection Control lets you do things like pause and resume profiling, name marks and annotate the VSP file by adding marks to it. To show how to use the basic functions of the Data Collection Control I’ll walk through a few quick scenarios to demo these features. This TechNote assumes that you are familiar with the basics of using the profiler, if not check out the TechNotes here for a good intro.
Eliminate Startup Time from Analysis
A highly common customer scenario that we are addressing with the data collection control is the customer who wants to collect steady-state data on their process and who does not care about how their time is being spent in startup. By controlling when you are collecting performance data the data collection control makes it a breeze to exclude startup information from being collected. For my example, I’m using the PeopleTrax sample application. When you first start up PeopleTrax the application take a while to initialize, then you can click the “Get People” button to populate a list of customer names. For the purpose of this demonstration I want to just collect data on the actual “Get People” operation and not collect any data during the application startup or shutdown.
First off, make sure that you have a performance session set up in the Performance Explorer that is targeting your application. Now normally we would use the green launch button on the Performance Explorer to start our profiling session. But in our case this would not work, as we would begin by collecting data, and we don’t want to record any of our startup time. So instead of clicking launch, we’re going to click the dropdown control on the launch button and select “Launch with profiling paused.” This will launch the application with the profiler attached to the process, but will the actual collection of profiling data paused.
Now after we have launched the PeopleTrax application will pop up, and back in the IDE we will see the new data collection control docked with the Performance Explorer (the data collection control should be on top of the Performance Explorer when you are currently profiling).
Along the top toolbar of the data collection control you will see several icons, two of which are of particular interest to our current scenario; these being the pause profiling (second from the left) and resume profiling (third from the left) icons that control profiling state. Since we started out with profiling paused the profiler is not currently collecting data and thus the pause profiling button is grayed out. After the PeopleTrax application has initialized and popped up on-screen we have now passed the startup portion of the application and we want to collect data on the “GetPeople” operation.
To do this, we will enable profiling data collection from the data control. Just click the “Resume Profiling” button. Now the pause profiling button will be active and the infobar on the data collection control will inform you that profiling is active. Right after activating profiling move over to the PeopleTrax application and click “Get People” to populate the customer list from the database. After the listview is populated click the pause profiling button to stop collecting data and then shutdown the PeopleTrax application to generate your report file.
Now the generated report will have only the data collected after you pressed the resume profiling button and before you pressed the pause profiling button, excluding startup and shutdown time. You also could exclude this time by annotating your code to start and stop profiling, but the data collection controls makes this a quick and simple operation from the IDE, without having to muck about with command line tools.
In my next blog entry I’ll look at how we can use the data collection control and filter queries to view only specific parts of performance reports after collecting a large data file.
Comparing performance reports with the Visual Studio Team System Profiler
With the recent release of the first Beta for Visual Studio Team System (codename Orcas) customers will get their first chance to see all the great new features that we are adding to the product. For the profiler in particular we’ve added some very cool new features that I’m really happy to finally be able to reveal publically. Remember that the profiler is only included with VSTS: for Developers and VSTS: Team Suite, so make sure that you have one of those SKUs if you are following along.
Comparison Reports from the IDE
The first feature that I want to cover is the new comparison document that we’ve introduced to help users compare profiling data from two different profiling runs. Say that you have just checked in a possible fix for a performance issue and you want to compare a new performance report to one that existed before the change to see if your change really helped. Our new comparison features make these types of questions easy to answer. Comparing two files of performance data is a very common scenario for customers, especially when dealing with regression testing, so this was a priority feature for us in this release. After all, performance data when taken in isolation, without goals to hit or old values to compare with, can be pretty hard to work with. Our goal with this new comparison work is to help customers to make better use of their performance data to achieve the performance results that they desire for their applications.
I would be remiss if I did not take a quick second here to dive a little deeper into the importance of setting performance goals for your applications. Too often developers end up in the situation of closing in on product release and realizing “my app is just way too slow.” Now, it is perfectly understandable that developers want to save performance optimization for the end of the product cycle; after all if the underlying structure is going to change greatly why waste too much time early on trying to tweak things to run as fast as possible? But the real issue with the situation above is in the generalness of the “my app is just way too slow” part. What exactly do you mean by “too slow?” What parts of the app are too slow? What type of performance do customers expect from your app? How long does it take for other similar products to do the same task? For our new comparison features to be really useful you will need to take some time before and during development to get at least basic answers to some of the questions above.
So now that you know to set performance goals for major functionality of your application how can you use the new comparison functionality of the profiler to meet those goals? First off, let’s say that you have just submitted some major code changes to your product and you want to see how those changes have either improved or degraded the performance of your app. Start off by moving to the new “Developer” top level menu in VS. This menu (Figure 1.1) is new, so expect that the names and locations of the items in it may change by the time that RTM rolls around. From this menu select the “New Comparison Report” option to start the process of creating a new comparison file.
(Figure 1.1 – The new VSTS developer menu)
Next, the UI will pop up a dialog that will ask you to select two performance data files to compare. Note that in this screen you can select any combination of VSP and VSPS files provided that both files are either both sampling mode files or both instrumentation mode files. If you have used the VSTS profiler before you will know that VSP files are the report files created by the profiler, but you may not be familiar with VSPS files. I will cover VSPS files in more depth in a following article, but in a nutshell they are VSP files that have been saved after performance analysis, resulting in a much smaller file that can be opened much quicker than a full VSP file. However, after saving as a VSPS file you lose some of the advanced analysis options that you had with a VSP file. For this example, I’ll just be comparing two profiling runs of a simple console application. This application concats a bunch of strings (a common performance issue) and since VSTS code analysis flags this as a possible issue I’ve converted it to use the StringBuilder class instead. The second analysis file is from the performance run that uses StringBuilder and I want to compare it to the earlier run to see if I have made any performance gains.
If you select one sampling mode performance file and one instrumentation mode performance file then you will get an error popup about incomparable file types. Sampling and instrumentation work in totally different ways and with completely different column sets, so we don’t want to allow comparisons between the two types of files.
After selecting the two files to compare and clicking “ok” a new document window will open in the IDE and you will see progress bars for the analysis of both of the performance files. After the loading finished you will see a comparison document like the one below open in the IDE (Figure 1.2).
(Figure 1.2 – The comparison report view)
So there is a heck of a lot of information in that comparison report, so let me see if I can help to break it down. In the top left “Comparison Files” groupbox we see the filenames of the two report files that we are comparing (Concat.vsp and StringBuilder.vsp). In the “Comparison Options” groupbox we specify what values to use for comparison and what magnitude of change we require to actually show items in the comparison report. The “Table” value in this group specifies the type of objects that we are comparing. The default value is to compare functions but you can switch this to compare other values like modules or IPs. The value in the “Column” combo box specifies what report column to use to compare between the baseline report and the new report. The column values available for comparison will change based on the comparison table that you have selected. By default, we are comparing the “Exclusive Samples %” value for the functions table. If you are not familiar with the various performance columns (you can get more info on performance report columns here) “Exclusive Samples %” is the number of total samples that were taken in the specific function and not in any subfunction. Finally we have the “Threshold” value. This value specifies the minimum difference that there must be between new and baseline values for the function to actually be listed in the comparison report. For example, let’s change the threshold value to 1.0 and click the apply button to apply the changes. Now the comparison report has been trimmed to just show functions that changed by one percent of the overall samples (figure 1.3).
(Figure 1.3 – The trimmed comparison document)
Now that we have trimmed our report to a more manageable size let’s take a look at the body of the comparison report. In the “Comparison Column” we see the names of the functions that are being compared between reports. Next we see columns that show the value of “Exclusive Samples %” in both the baseline report, in the comparison report and the delta between the two (comparison value – baseline value). Note that now that we have changed our threshold value to 1.0 only functions with an absolute delta value greater than 1.0 show up in the report. If we were to change threshold to 0.0 we would see every function in both reports, even if the function did not change value at all. Functions that were only present in the baseline report or only present in the comparison report also show up in this report as long as their value (and thus their delta value) is greater than the threshold value.
So what does the comparison report tell us about our code change from using String.Concat to using StringBuilder.Append when constructing our strings? Well it appears that our overall change to system performance was basically a wash, we used to spend 4.4% of our time in the AppendItemsConcat function and now we spend 4.62% of our time in the AppendItemsStringBuilder function. At a lower level we spend less time in the String.wstrcpy function and more time in memcopy, this is to be expected based on how String.Concat and StringBuilder handle combining strings. Also, if we see something in the comparison report that we want to investigate further we can right click on that function and choose “show matching row in baseline file” or “show matching row in new file.” These options will open up (this open should be very quick since we already have it analyzed the file for the comparison) the matching performance report so that you can dig deeper into the calltree for that specific function. You can see the result below of right-clicking the “show matching row in new file” on the String.wstrcpy function. The new performance file has been opened and the String.wstrcpy function has been highlighted in the function view (fig 1.4).
From the investigation above we can see some differences between the Concat implementation and the Stringbuilder implementation but we really don’t see any measurable performance gains. Perhaps instead of saving CPU cycles we actually ended up saving time and space in allocations. If we want to take a look at the different allocation patterns for the two scenarios we can create two memory profiling reports and compare them. Note that memory allocation reports are a separate type of report so you will not be able to compare allocation reports to non-allocation reports. In the example shown below we can see the difference in managed allocations between the String.Concat and the StringBuilder scenarios (Figure 1.5).
(Figure 1.5 – A managed memory comparison report)
In this comparison I’m looking at the “Types” table and checking the difference in bytes allocated for that specific type (Exclusive Bytes). As we can see, for the cost of allocating one StringBuilder and a Char I’ve cut down the amount of string bytes allocated in half! Not too shabby. Outside of specific performance investigations the new comparison reports can be a really handy learning tool for when you want to compare a few different approaches to the same scenario.
Comparison Reports from the Command Line
So now that I’ve show you how to work with these nifty comparison reports from the IDE you may wonder how you would integrate this functionality into something like a script driven regression prevention system. We’ve foreseen this need (and we know that not everyone loves the VS IDE like I do) so we added on comparison report functionality to the command line performance report tool VSPerfReport. VSPerfReport can be found in the Team Tools\Performance Tools directory under the main visual studio directory in Program Files.
If you run -? on VSPerfReport.exe you will see the new comparison (called diffing in the command line tool) in the “file diffing” section.
---- File Diffing ----
/diff Activate diffing mode for comparing two vsp files (summary options will be ignored in diff mode)
/diffthreshold:[value] Below this value of difference the differ will regard two values as the same. Also, new data with values under this threshold will not be shown.
/difftable:[tablename] Use this specific table to perform diffing. The default is the functions table.
/diffcolumn:[columnname] Use this specific column to perform diffing. The default is the exclusive samples percent column.
/querydifftables List the valid diff tables and columns for the two vsp files provided.
To perform a simple comparison using all the default values just type something like the following:
Vsperfreport /diff Concat.vsp StringBuilder.vsp
This will compare the two reports using the “Functions” table and the “Exclusive Samples Percent” column with a default threshold level of 0.1. This command will generate a CSV (comma separated value) file that contains the same data that we would see in the IDE comparison report.
Location, Function Name, Old ExclSamplesPercent, New ExclSamplesPercent, Delta
<extra function rows snipped for brevity>
In this text report you will see the location of the function (Both, OrigOnly, NewOnly) as well as the name, old value, new value and delta. Remember that by default this has a threshold value set, so if you don’t see all the functions that you expect some of them may be under the threshold value.
Just like in the IDE reports you can tweak the table and column used for the comparison. To see all of your options just run /querydifftables with the /diff command to see the output below (customized for report type of course).
Available Tables and Columns for Diff:
Column: InclSamples -- (Inclusive Samples)
Column: ExclSamples -- (Exclusive Samples)
Column: InclSamplesPercent -- (Inclusive Samples %)
Column: ExclSamplesPercent -- (Exclusive Samples %)
Column: InclSamples -- (Inclusive Samples)
Column: ExclSamples -- (Exclusive Samples)
Column: InclSamplesPercent -- (Inclusive Samples %)
Column: ExclSamplesPercent -- (Exclusive Samples %)
Column: ExclSamples -- (Exclusive Samples)
Column: ExclSamplesPercent -- (Exclusive Samples %)
Column: ExclSamples -- (Exclusive Samples)
Column: ExclSamplesPercent -- (Exclusive Samples %)
Then just use the /difftable and /diffcolumn switches with the diff command to change the table or the column being used. Below I’ve listed an example of doing a comparison of Inclusive Samples on a per module instead of a per function basis.
vsperfreport /diff /difftable:Module /diffcolumn:InclSamples Concat.vsp StringBuilder.vsp
Also, you have the /diffthreshold switch that can be used to set which items will actually get included in the comparison report. From here you should be able to figure out how to generate the same types of reports from the command line as you could from the IDE.
Hopefully this article has given you a good intro to using the new performance report comparison tools in Visual Studio Team System codename Orcas. Before I go I’ll leave you with one last cool feature that we’ve added for Orcas. Say you wanted to show a co-worker the results of your performance investigations into String.Concat and StringBuilder. Just highlight those rows, hit copy and then paste into an Outlook e-mail (fig 1.6).
(Figure 1.6 – Comparison data pasted into an e-mail)
Check out that fancy HTML autopasting! Much nicer then having to look over the unformatted text.
This past summer marked my second year (plus one summer internship) of working at Microsoft and on the Visual Studio Team System profiler. As a fresh out of college hire I had no experience working with large applications or working with application performance in particular. For me, as for many other college students I would assume, performance only meant that the application didn’t lag visibly when presenting it to a professor. If an application did have performance issues my education did give me some clues about where to look to improve application performance but improving performance was mostly a trial and error process of tweaking the high CPU time functions to try and get them running faster. Working at Microsoft has exposed me to so many brilliant performance engineers that sometimes I still feel like I’m in that same college stage of not really knowing what I’m dealing with when it comes to performance. Yet when I look back over my past two plus years here I realize that I’ve learned quite a bit when it comes to performance engineering. So I figured that I would take this chance to share some of the most important things that I’ve learned about performance in my time here at Microsoft. Note that these are more general guidelines that I’ve learned about performance as opposed to specific X is better than Y recommendations.
1. Improving application performance should be a planned process as opposed to something that you do at the last minute in response to a major crisis
One of the trickiest things about performance is detailing exactly what “good” or “acceptable” performance is from an application. To really engineer application performance you need to have some goals about how fast various operations in your program need to finish. Now setting these goals can be very hard, especially if you are working in a relatively new area. Still try to get some numbers down about how you expect / need your program to perform. If you are having trouble getting these numbers try looking at some competitor’s products or products that perform similar tasks to yours. If you are working on a video conversion utility to convert DVD video to Zune sized video and a similar program already exists you better make darn sure that your program takes about the same amount of time or less. Also, remember that these numbers can be changed as you flesh out your design and are able to get some initial performance numbers from prototypes and initial builds. The important thing about these numbers is that early in your development you are thinking about performance issues and about how your application’s performance will affect the experience of your end user.
2. Don’t go crazy trying to optimize low level performance with the profiler before you understand the high level operation of your program and have examined the basic algorithms that you are using
So the profiler is a very powerful tool and it can be indispensible for optimizing the performance of your applications. But you have to know the best times to actually unleash this tool. If you spend a ton of time early in your development cycle using the profiler to optimize some order n squared algorithm you might not be making good usage of your optimizing time. Perhaps you should take a step back and examine if you really need to use that order n squared algorithm and if you could replace it with something order n log n. Remember that the profiler is not a magic performance wand, you need to understand your program, your algorithms and what you are trying to accomplish to use it properly.
3. Know your framework, domain and language
A lot of time in school you will hear your professors tell you (often heard when students complain about having to work with a specific language) that if you learn good programming skills you should be able to pick up different languages and be effective in them very quickly. Now there may be a lot of truth that if you learn how to be a good programmer you can work well in different languages; but when it comes to performance optimizations you really do need to know the language, domain and framework that you are currently dealing with. For example when working with a .NET datatable you need to call BeginLoadData before you start loading in your initial datarows so that you can turn off things like table constraints until all of the initial data is loaded and you call EndLoadData. But if you don’t know this Datatable will happily allow you to add items one at a time and your performance will be terrible. And if you look use the profiler you can see that it is taking a long time to load your initial data but you may not have any clue how to fix it. Really knowing a framework takes a lot of time getting familiar its syntax and all of the little nooks and crannies with it. Performance optimizations can be really hard until you’ve put in enough time to at least be proficient with the framework, domain and language you are working with; general programming and performance knowledge won’t be able to bail you out here.
4. Be willing to ask experts for help in specific areas
In the previous section I talked about really needing to know the specific domain and framework that you are working with before you can make the best use of profiler performance data. But how is it possible to keep up with all the different programming domains and frameworks even internally at a company like Microsoft? The simple answer is that you really can’t, sometimes you just need to isolate a performance issue and then take the results to someone who knows that domain to help you to analyze you results. Say that in the example above you are adding rows to a dataset without calling BeginLoadData and your startup time is scary slow. At this point, if you don’t know much about working with datasets ditch the profiler and seek out someone with expertise in this area. Note that this doesn’t mean you need to know someone personally to help you out. This can be as simple as searching MSDN or live.com for “initialize .NET Datatable” or something of that type. The more important point is to not go crazy staring at the profile trying to squeeze blood out of a stone. Go ask someone who knows better than you or start doing some research of your own online. Just make sure that you’ve done the footwork first and can narrow down the problem for whoever you ask. Being able to use the profiler can help you ask questions like “My application startup time is about five seconds longer then I would like as I seem to be taking to long time to load my initial data into my Datatable, can you help me out with this?” as opposed to the perpetual horror “my app is way too slow d00d!!111!!1fix teh .NET framework l4zy M$ devs!!!!1111!!!"
Tomorrow there will be one of the first MSDN Team System chats in a while at 10:00am Pacific time. Below is the official announcement.
Come and join members from the Visual Studio Team System product group to discuss features available in Visual Studio Architect, Developer and Tester editions and Team Foundation Server. There will be experts on hand to answer your questions, so we hope to see you there!
Join the chat on Wednesday September 6th, 2006 10:00am - 11:00am Pacific time.
Sorry about the lack of updates here recently. Currently we are in a rather difficult stage to write interesting articles. By now I’ve covered most of what I want to about the RTM version of the profiler and it’s too early to talk about many of the features that we are adding to the next version of the profiler. Hopefully in the coming weeks I’ll be able to cover some of the interesting new features that we are putting into the next version of the profiler.
So for now, I’ll continue to look at a few of the free tools that I find indispensible for my programming work. This week’s tool is sure to be one that most are already aware of but for those not yet familiar you are in for a real treat withProcess Explorer. This tool from Sysinternals (which recently became part of Microsoft) is essentially what the task manager wants to be when it grows up. It provides all sorts of performance metrics in addition to tracking what handles and DLLs are being used by currently running processes.
Process Explorer is such a powerful tool that I can’t even begin to make an enumeration of all the ways that it can be used, so I’ll content myself with just listing the ways that I end up using it most often. My main use for Process Explorer is in tracking down what DLLs and environmental variables are associated with a given process. For example, when profiling a managed application the profiler both needs specific environmental variables to be set up and to load specific profiling DLLs. So I open Process Explorer and select the target icon (to the right of the binoculars icon) and drag it over the managed process that I’m trying to profile. This automatically selects that process in Process Explorer without me having to look up its PID or window title. From there, I can right click on the process and go to the properties window. In that window, the environment tab will tell me whether or not the proper managed profiling environmental variables have been set. Now in regards to searching for if the correct profiling DLLs are loaded I tend to use a different approach. A quick hit of Ctrl+F will bring up the find DLL or handle window in Process Explorer. From there I just enter the name of the DLL that I’m looking for and Process Explorer will show me all the process that have loaded that DLL.
My other common usage for process explorer is to examine if a specific application is IO, Memory or CPU bound. From the process properties page that I mentioned above (the one with the environment tab) you can access a wealth of performance information from the Performance and Performance Graph tabs. The Performance Graphs tab gives you timelines of CPU Usage, Private Bytes and I/O Bytes. Before even firing up the profiler it is a good first step to take a look at these charts to see of you can pick up some of the basic performance characteristics of your application.
So inmy last blog article I talked about using the fuslogvw.exe tool to figure out .NET assembly binding issues. In this article I’m going to look at another tool that I find myself using daily in the debugging world, depends.exe. In a nutshell, depends looks at your binary files and creates a tree of all their dependencies along with some other useful information. Unlike fuslogvw.exe this works with both .NET and native applications. You can get depends from the official site and it also ships with many different Microsoft development products so you may already have it present on your system. You’ll want to use depends when one of your programs fails to load with some sort of system error or initialization message. I make it a habit to check any program that fails on startup with depends first so that I don’t waste any time trying to debug what is just a missing assembly or a versioning mismatch.
At a basic usage level depends is super simple to operate, just load depends and do a file open on the binary file that you are having issues with. When the file opens up you should see something similar to what I’m showing below.
This example I just did using by using notepad.exe. In the top-left pane you can see the tree structure of all the dependent assemblies of notepad. In the middle pane you can see more detailed information about all of the items in the dependency tree. And in the bottom you can see any errors that popped up during analysis. As you can see, some errors are benign as I currently have a possible error in notepad.exe but it runs just fine. Another benign example is that MSJAVA.dll often shows up as missing in the dependency tree but the application will run just fine without it.
So we’ve seen what depends looks like when a binary with all its binaries is correctly added. So what types of errors and mistakes can depends help you to find? Here is a quick list of the ways that I’ve used depends or seen depends used:
1. As a quick way to check the architecture of a binary
2. CRT side by side issues for msvcr80.dll
3. Discovering that a driver was linking in user mode code (a big no-no)
4. Checking the timestamps of dependent files
5. Checking to see what version of a dependent file is being loaded
So if you are running into initialization or dependency issue with your native or managed applications give depends a try to see what you turn up.
So let’s say that you are working with a large managed project and for some reason, one of your binary files just refuses to load. Perhaps you get some esoteric error message and perhaps not. If so then you may very well be in the world of managed assembly binding issues.
For me, Microsoft was my first real programming job after college, so I was not used to having to debug assembly binding issues. But now that I work here I realize how much “debugging” time is really spent getting all of your assemblies to have the correct signing, versioning and to play nice together as opposed to actually stepping through code and fixing issues. As I’ve had to learn all of this in a work environment I thought that I would share some of the small tools and tricks that I’ve learned for diagnosing these issues. Please note that I’m still quite the rookie at these matters so if you are already a kung-fu managed developer then I’m probably not saying anything here that you don’t already know.
One of the first tools that I turn to in these situations is fuslogvw.exe (great name, really rolls off the tongue easily ;-)). This tool keeps track of all managed assembly binding attempts that happen on your system. However, like its name, fuslogvw.exe is rather esoteric and its output can be a little hard to interpret. Hopefully this article will help to get you started with fuslogvw and understanding what it is telling you.
When you first open fuslogvw you will see the screen below. In the main section is where you will see specific binding events, but only after you have turned on the proper settings.
Click on the settings menu to bring up the settings for fuslogvw. As long as my application does not load too many dlls I like to turn on “Log all binds to disk” so that I can see all of the binds that are happening and not just the failures. Also, I sometimes get funny issue when not using “Enable custom log path” so I usually turn that on and point it at a log directory (this keeps it from using the IE cache for logging). For this example I’m going to purposefully delete one of my dependent dlls, select “Log bind failures to disk” and then try to run my application. When it crashes I will then move back to fuslogvw and click the refresh button. Now I will see an entry in the screen about the dll that failed to load.
By double clicking on the entry I can get a text file of information about the file that failed to load. I’ve included the text file from my example below (some names omitted to protect the innocent). From the log we can see that we tried to look up the assembly in the GAC then we tried a few different extensions and locations to try to locate the file. Also, if the file was found but had incorrect signing we would see that here at this stage of the game. Hopefully this has helped to introduce you to fuslogvw and invaluable tool for diagnosing binding issues with managed .NET applications.
*** Assembly Binder Log Entry (7/12/2006 @ 3:27:59 PM) ***
The operation failed.
Bind result: hr = 0x80070002. The system cannot find the file specified.
Assembly manager loaded from: C:\WINDOWS\Microsoft.NET\Framework\v2.0.60425\mscorwks.dll
Running under executable C:\Program Files\Microsoft Visual Studio 8\Team Tools\Performance Tools\XXXXviewer.exe
--- A detailed error log follows.
=== Pre-bind state information ===
LOG: User = REDMOND\XXXX
LOG: DisplayName = XXXXvisualization, Version=126.96.36.199, Culture=neutral, PublicKeyToken=XXXXXXXXXXXX
LOG: Appbase = file:///C:/Program Files/Microsoft Visual Studio 8/Team Tools/Performance Tools/
LOG: Initial PrivatePath = NULL
LOG: Dynamic Base = NULL
LOG: Cache Base = NULL
LOG: AppName = XXXXviewer.exe
Calling assembly : XXXXviewer, Version=188.8.131.52, Culture=neutral, PublicKeyToken=XXXXXXXXXXXX
LOG: This bind starts in default load context.
LOG: No application configuration file found.
LOG: Using machine configuration file from C:\WINDOWS\Microsoft.NET\Framework\v2.0.60425\config\machine.config.
LOG: Post-policy reference: XXXXvisualization, Version=184.108.40.206, Culture=neutral, PublicKeyToken=XXXXXXXXXX
LOG: GAC Lookup was unsuccessful.
LOG: Attempting download of new URL file:///C:/Program Files/Microsoft Visual Studio 8/Team Tools/Performance Tools/XXXXvisualization.DLL.
LOG: Attempting download of new URL file:///C:/Program Files/Microsoft Visual Studio 8/Team Tools/Performance Tools/XXXXvisualization/XXXXvisualization.DLL.
LOG: Attempting download of new URL file:///C:/Program Files/Microsoft Visual Studio 8/Team Tools/Performance Tools/XXXXvisualization.EXE.
LOG: Attempting download of new URL file:///C:/Program Files/Microsoft Visual Studio 8/Team Tools/Performance Tools/XXXXvisualization/XXXXvisualization.EXE.
LOG: All probing URLs attempted and failed.
It just so happens (and I know that I’ve mentioned it here before) that the Visual Studio Team System Profiler has both sampling and instrumentation modes available for you to use. However, in most of my samples so far I’ve been using the sampling mode of the profiler to profile some small applications. And for almost all CPU bound applications the sampling mode is the mode to use as it collects much smaller data files as compared to instrumentation profiling. However, for diagnosing performance problems that are not CPU bound then you may only be able to track down the performance issue by using the instrumentation mode of the profiler. In this blog entry we will look at using the instrumentation mode of the profiler to track down an IO bound issue in a small sample application.
But before we get too deep into using the profiler to diagnose an IO Bound application we may want to better define exactly what “IO Bound” and “CPU Bound” mean.
· IO Bound – An IO bound application is an application where its speed is dependant on waiting for IO operations to complete. In other words, you are asking for data and moving it around more then you are actually computing things
· CPU Bound – A CPU bound application is an application where its speed is dependant on waiting for the CPU to process data. In this case you have the data, but you have to wait to perform calculations on it
There are other types of binding (such as memory bound) but we’re not going to look at those in this specific blog entry. In general, IO bound issues are caused by things such as moving data from your hard drive into memory in some inefficient fashion. This is opposed to things that are CPU bound where your time is spent waiting for the CPU to finish processing tasks.
In general we recommend that users start out by using the sampling mode of the profiler as it is lighter weight and collects smaller data files. One exception to this rule is IO bound applications. For IO bound issues sampling on CPU cycles (take a snapshot of program execution every X cycles) is not a good was to investigate. Since program execution is blocked on waiting for IO operations to complete you will end up seeing very high sample numbers in wait functions.
On the other hand, instrumentation mode will help to point you at the exact function that is taking all of the time as opposed to the wait function. It will help to keep you from being mislead by the time that is spent in those wait functions. Right now I’m working at cooking up some examples of IO Bound and Memory Bound applications. So as soon as those are up I’ll post some walkthroughs of profiling them with the Visual Studio Team System Profiler.
For my team I’m the point man on tracking all of the customer feedback bugs that we get submitted from external users. This is one of the best parts of my job as I really enjoy interacting with customers and getting to keep my finger on the pulse of what issues are bugging our users the most. Having a deeply involved community of users that will call out any flaws in your software is not just a perk; it’s a necessity to ship high quality software in this day and age.
Previously I’ve blogged about the ladybug system that customers use to submit feedback issues to us. While Ladybug worked ok for us it had some issues with how it interfaced with customers and with our work item database. So for our Orcas release we’ve switched over to a new Connect system to use for reporting customer issues. Connect is a nice system and we hope that it will be easy for customers to use and easy for them to get issues filed against the correct team (this was a big issue with the ladybug system).
Say that you are working with the profiler in Visual Studio Team System for Developers when you think up a suggestion for a new visualization that you’d like to be able to use to view performance data. What are the steps that you would take to get a new suggestion filed in Connect? Note that many of these steps will only need to be done the first time that you open a Connect issue.
1. Navigate to the main Connect site
2. In the left hand navigation menu select Available Connections
3. “Visual Studio and .NET Framework” should be in the available connection list, click on it
4. Agree to the terms of service on the next page
5. Fill out the registration information on the next page
6. After registering, read the introduction on the welcome page and click the “Feedback” link to get to the actual feedback page
Now that you are registered you can just bookmark the feedback page so that you can get to it easily when you have more suggestions to add or issue to report. So how do we go about opening our new suggestion from the feedback page?
1. When submitting a new issue or suggestion the place that you will want to start is the big “Submit Feedback” button at the top of this page.
2. The first step in the submission process will force you to search to see if your issue or suggestion has already been files.
3. Please, before you submit your feedback use the search box with a few different keywords to see if your particular suggestion or issue is already registered with Connect. If it is already registered then please just add a vote for it as opposed to opening up a new feedback issue. That way the developers here at Microsoft can spend less time resolving issues as duplicates and more time actually fixing customer feedback issues ;-)
4. So you’ve done a few searches and you don’t see anything like the issue or suggestion that you want to raise. To proceed just click the “Submit Feedback” button located at the bottom of the search results.
5. Next, pick if you are submitting a bug or a suggestion
6. Now we are finally ready to submit our bug or suggestion. On the submission page please try to fill out as much information as you possibly can (attaching files or screenshots is nice as well). We try hard to work on all the issues that come in, but the better that you explain your suggestion or scenario the easier it is for us to figure out what you want, repro the issue and get back to you with a fix or a workaround. Putting non-specific statements like “Actual Results: It doesn’t work” and “Expected Results: It should work” don’t really help us at all. Try to be specific about exactly what you are seeing, about what is wrong about what you are seeing and about what would be the ideal fix for the issue. Remember that the more specific detailed information you can give us the better the chance of us being able to understand your issue and get it fixed.
7. Also, put some thought into what Category and Subcategory you will place your issue into. Like with giving us more detailed information placing the bug in the right spot will help us to not have to bounce the bug around in our work item tracking software (go TFS!) and we will be able to get back to you quicker.
8. After submitting your bug you will be able to track it from the “My Participation” tab on the main connect website. Not only will Microsoft employees be able to give you feedback and ask you questions but other Connect customers will be able to vote on the issue or suggestion and help you with possible solutions.
So that’s my brief little introduction to using the Connect system. Remember that the more you participate with Connect the better future releases of Visual Studio will be. So if you use Visual Studio you should also be using the Connect system to help get the features and fixes that you want.
In previous blogs we’ve talked about using the Data Collection API to limit the amount of data that you collect in instrumentation mode. I’ve been asked recently if the Data Collection API can work with sampling mode as well as instrumentation mode.
The answer is yes, but in a limited fashion. Normally with the Data Collection API you can specify starting and stopping profiling at global, thread or process levels. But in sampling mode, you can only use global level to control profiling. For managed just add a reference to Microsoft.VisualStudio.Profiler.dll and use the DataCollection namespace. And for native code include VSPerf.h and use the functions defined in there.
A customer recently asked me if the profiler included with Visual Studio Team System would allow you to profile an application while the debugger was attached. While I knew that it would (from talking with other team members) I had never actually tried this, so I set out to give it a go.
To make this scenario easier I chose to just debug and profile notepad.exe as opposed to a program of my own. First off I chose to profile in sampling mode, here are the steps that I used to collect my performance file:
1. Create a new performance session in Visual Studio
2. Right click on the Targets folder and select “Add New Binary” to add notepad.exe as a target for profiling
3. Click the launch button to launch notepad.exe with the profiler attached in sampling mode
4. In the IDE use Debug->Attach to Process to attach the debugger to the notepad.exe process that the profiler launched
5. Now you can profile your debugging session
So what do you do if you don’t want to wait to attach the debugger to your program that you are profiling? Then you can just launch the program with the debugger attached first (outside the performance explorer) and then use the “Attach” button in the performance explorer to attach the sampler to the running process. Instrumentation mode works just the same as sampling mode, except that you don’t have an “Attach” mode for instrumentation profiling.
The web and load testing teams here have also put together an FAQ to address the most common issues seen on their forums. Check it out below: