Search Logger
Posts from: MSCOM

Author Archive

MSCOM OPS March Debug Madness…3rd Session Q & A Dignosing Memory Leaks in ASP.NET Applications

2:55 pm - March 31, 2006 in Microsoft.com Operations

Wednesday’s topic was Diagnosing Memory Leaks in ASP.NET Applicationspresented by Jim Dobbin, a Sr. Systems Engineer on the MSCOM OPS Debugging Team. As this week progressed we continued to be impressed by the level of the questions asked. Here are your questions and Jim’s answers.

 

1.       Asked: Is this built into Visual Studio or where do you enter these commands?
Answered:  This is debugging with the Debugging tools for Windows and the SOS extension for managed code.  Completely outside of the Visual studio environment.

2.       Asked: Is debugdiag not a useful tool in this situation? Or in other words when do you use windbg and not debugdiag?
Answered: Debugdiag is a more automated tool that does some general analysis, which can be very useful. We prefer to use windbg/cdb as we have some debugger extensions that enable us to more quickly isolate the problem. In some cases, we will use automation scripts with windbg/cdb to track down problems somewhat like debugdiag would. I would say that debugdiag knows quite a bit about IIS internals, but not as much about asp.net/clr internals.

3.       Asked: Is there a debugger quick reference guide for the commands?
Answered: There is always the debugger help docs and within SOS you can use the !help command to get the reference guide for that extension.  There is probably a quick reference guide out there somewhere on the internet, but I'm not aware of one we have. We will be posting all our debug logs to our blog after the week is over. We'll try to throw together a quick reference with that post.

4.       Asked: I got here late. Is there a url to listen to this in entirety at a later time?
Answered: Yes, to view this webcast again you can go to www.microsoft.com/webcasts this will be available to view or download in 24 hours

5.       Asked: What search terms are useful when searching for Microsoft KB articles on Microsoft applications like CRM that have memory leak problems? I have been getting out of non-paged pool problems on my system since installing CRM but I cannot find and reports of the problem searching the support articles?
Answered: usually the application name and "memory leak" is sufficient. In the case of non-paged pool, CRM may be consuming user mode objects that have kernel mode resources, such as handles. Poolmon.exe would be a good tool to help track down this leak. Just do a search on support.microsoft.com for poolmon and you should find the appropriate documentation. http://support.microsoft.com/kb/177415  How to Use Memory Pool Monitor (Poolmon.exe) to Troubleshoot Kernel Mode Memory Leaks

6.       Asked: What is the name of the app he is using?
Answered: The windows debugger tools, specifically windbg from that package.

7.       Asked: How do you know what the source code .aspx file is?
Answered: Typically, !gcroot or !clrstack will call out the name of the aspx filename if it's involved in the leak. It will be something like ASP.filename_aspx. If all you have is a reference to the assembly like Web_App_xxxxx.dll, you can do a findstr on the temporary asp.net assemblies directory under c:\windows\microsoft.net\framework\v2.05727\, searching for the assembly name in all *.compiled files. This will give you the aspx page that was compiled into that assembly.

8.       Asked: Any suggestions on debugging a live site? In some cases, it may not be possible to take a machine out of rotation. In other words, is it possible to get a "dump" I could copy down to a dev box and examine?
Answered: You could use the debuggers and ADPlus or other tools like Debugdiag to get a dump for offline examination.   But this is usually still intrusive while the debuggers runs and saves the dump.  For memory investigations like this, we often use this method and investigate offline. 

9.       Asked: I own John Robbins book on Debugging .NET. All there other good references about debugging? I'm able do to debug fairly well using Visual Studio IDE. Any material for novices using WinDbg in managed code?
Answered: Take a look Production Debugging .Net Applications: http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/DBGrm.asp
It has some great debugging walkthroughs. We are going to try and post a quick reference for sos as well as some other commands to our blog after the week is over.

10.   Asked: I am looking for the debugger. Where can I find it?
Answered: http://www.microsoft.com/whdc/devtools/debugging/default.mspx

11.   Asked: Hi there... what perfmon counters can I add to diagnose an asp.net app memory leak?
Answered: I would go ahead and view the webcast from Monday: Microsoft.com Operations Introduces Real World Debugging: Determining When You Have a Problem and Beginning the Initial Debugging (Level 300). Jeff Johnson did a great job on covering that. Basically, you'll want to pay attention to the .Net CLR Memory counters, specifically Gen0, 1, & 2 heap size counters, as well as Large Object Heap size.

12.   Asked: GREAT Presentation... It's nice to gain some more insight into debugging tools and practices. Do you have, or will a upcoming webcast cover debugging JavaScript in IE? Your team BLOG address?
Answered: Go ahead and post to our blog and we'll see what info we can provide - http://blogs.technet.com/mscom/default.aspx

13.   Asked: I used to program in assembly back in the old days of the Z80, but since then I have not seen anything in x86. Any resources for a novice x86 assembly reader?
Answered: All kinds of resources for x86 and x64 for that matter. A great book is The Art of Assembly Language by Randy Hyde. You can also get the reference books from Intel and AMD through print or online. There's a minimal charge for those reference books.

14.   Asked: Do identical applications recompiled for 64Bit tend to use more or less than a 32 bit compile?
Answered: It really comes down to how many pointers the application uses, such as whether the application uses a lot of string objects or custom objects. Something that's really intensive with string objects could potentially have > 30% size overhead, but typically it's not that big of deal.

15.   Asked: What is the recommended "one-stop-shop" for debugging information? There is so much out there and hard to mine all this information out. There also seems to be a lack of continuity at MS with tools. For example there was no mention of the psscor extension and just a brief reference to DebugDiag. Many tools to do the same thing I guess but for us in the real world, it's hard to know what the best tools are.
Answered:  Unfortunately, no one stop shop that I am aware of. MSDN has good number of articles on debugging and blogs.msdn.com has a lot of debugging info being posted.  Regarding PSSCOR, we used to use that extensively when we were running on Framework 1.1 (Everett) and there is no PSSCOR for 2.0 (Whidbey) yet, although the SOS extension in 2.0 does incorporate many of its commands.

16.   Asked: Can you give a quick walk through on how to make a dump I can copy off the server...and then how I could use the win dugger to query it?
Answered: The quick run through – Attach a debugger to the process, you should get an initial break in.  run the “.dump /ma c:\mydumpfile.dmp “command , then “.detach” when the dump command says it successfully has written the file.  This is still intrusive debugging until you .detach. Copy the file and open it in WinDBG with the File menu Open Crash Dump.  You could also do this with ADPlus that comes with the debugger package.

Check out the debugging tools - http://www.microsoft.com/whdc/devtools/debugging/default.mspx and check out the debugger help – Debugger Operations section, Crash Dump files section and the extra tools section for ADPlus info.

17.   Asked: Were can I find tip like the ones that were told here "EAX generally contains the return of a method" or "CMP [ECX], ECX" is a check for a AV?
Answered: These really depend on the type of calling conventions that are being used. These can be found through x86/x64 assembly reference books. Some of this is just through experience.

18.   Asked: What is the best tool for debugging multithreaded app?
Answered: It really depends on what you're trying to troubleshoot. Certainly windbg/cdb handles multithreaded debugging with ease.

19.   Asked: I thought in ASP.NET memory leaks were a thing of the past?
Answered:  For the most part they are, it’s not easy to leak memory in ASP.Net. I had to work at it for my demos. Objects/Memory that fall out of scope will get cleaned up if there are no further references left to them.  So a leak like condition can still occur if objects still contain references to them.  The methods demonstrated with SOS are how you find the references that still exist should run into this.

20.   Asked: Is it possible to fix the problems, if possible, at the debugging stage?
Answered: It’s not easily done within the debugger. It’s great for finding bugs but not fixing them.  Although you can modify memory within the debugger, it’s not typically recommended as modifying memory outside of the application's normal program flow can cause unknown behavior. That being said, we have on occasion used the debugger to change certain variable to modify the application during runtime.

21.   Asked: Are you able to do everything that WinDbg does in Visual Studio?
Answered: Generally yes, and I think more than what VS can do with the different extensions available. Windbg has a richer command line experience for using debugger extensions, etc.

22.   Comment: Great series. Keep on with the good work!

23.   Asked: I've used CLR Profiler in order to see how the objects are allocated and the gens disposed, do you recommend it?
Answered: We do recommend the CLR profiler but find it is generally too much of a perf hit to attempt it in production. It is great for pre-production and test.

24.   Asked: Any suggestions for novices (with no knowledge/experience of assembly lang.), as to where to start?
Answered: A great book is The Art of Assembly Language by Randy Hyde. You can also get the reference books from Intel and AMD through print or online.

25.   Asked: Right now the federaldeveloper folks are giving a webcast on VSTE about automating testing with the software testing edition of VSTE. Do you think it's worth trying to figure out how to automate memory leaks with VSTE? Seems to me most memory leaks get figured out by a single tenacious programmer who cares and tracks them down manually? GREAT presentation - thanks again
Answered: I think that’s something the VS team is always looking at improving. Memory leaks are still pretty rare in managed code.  The difficulty I see is the how to get you the coverage for all scenarios. I went over just a few that I have run into in the past today and I am sure there are more I haven't seen yet.

26.   Comment: Again, many thanks. This is one of the most useful series I've attended.

 

I Just Want To Get My Code Into Production…Operations Can Be Your Best Friend

9:58 am - April 3, 2006 in Microsoft.com Operations

Bugs? What bugs? My code doesn’t have any stinkin’ bugs!! Of course the testers may have a little different view than the developers. Hundred’s of developers furiously writing code and they all want to hit their release date. Operations also wants to make sure that the scheduled releases go off without a hitch, on time, on schedule and with no problems. To aid in this endeavour, MSCOM Ops tries to provide information and operational guidance very early in the SDLC process.  We try to get information out in lots of formats; webcasts, TechNet Magazine articles, customer visits, conference calls and conference presentations…oh yeah and this blog!!

 

We also have recently published three white papers in partnership with Microsoft IT: Showcase. They are:

Microsoft IT Showcase: Microsoft.com Moves to x64 Version of Windows

Technical White Paper

Microsoft IT Showcase: Microsoft.com Server Configurations

Note to IT

Microsoft IT Showcase: Monitoring and Troubleshooting Microsoft.com

Technical Case Study

 

It does seem that folks read these (that’s a good thing!) because we got the following feedback from a customer in the UK:

 

“I and my customer were really interested in the Monitoring and Troubleshooting Microsoft.com (http://www.microsoft.com/technet/itsolutions/msit/operations/mscomtroubleshoot.mspx) studies that have just been published. I have some additional questions that I need to follow up on and wondered if you could direct me to the appropriate person?

 

Specifically, the customer would like more information on how new or updated applications are tested in the ms.com environment prior to going live. When there are 600 devs submitting code and content what processes are in place to ensure that resource hogs are eliminated before they get to production, and how is that testing conducted?”

 

What a great question! Microsoft.com Operations in a lot of ways strongly resembles an ISP hosting model. We are primarily responsible for the server infrastructure that the wide variety of applications that we support run on. Each group that provides code for us to install typically goes through the classic Software Development Life Cycle (SDLC). The timing in which MSCOM OPS interacts with that process goes something like this:

 

  1. Envision: The Business Unit comes up with an idea which typically is a business problem/need that will be addressed by code. The Program Managers write the specs, the Dev and Test teams take those specs and cost the project out, which means they estimate how many resources it will take to write and test the code. There typically are discussions surrounding what stays in the project and what gets cut, and the final result of these negotiations is what constitutes the “project”. Very early in this process MSCOM Ops Application Hosting team members and Systems Engineers are available and get involved in answering any questions that the project group may. Typical questions would be where the new code should live, what the infrastructure looks like, is there existing capacity or will there need to be new hardware built out.
  2. Design: This phase is where the specs are signed off and the Dev team is then responsible for writing the code. Ops is available to continue consulting on capacity planning, hardware evaluations and other questions that Dev may have. This is where Ops starts the initial Architectural Review process. We want to start to get specific information to know where the code will live, what it is intended to do, what dependencies that is has (web services, SQL back end, specific web permissions etc.) and most importantly what will be actually going on the servers. Typical sorts of questions that we would ask:

·         Are there assemblies that need to be put in the GAC?

·         What will we need to touch the in machine.config file?

·         Is the application managed code?

·         Have they properly instrumented the code so that Ops can provide relevant monitoring?

·         What does the application directory structure look like?

·         What app pool will it live in?

·         What are the anticipated traffic patterns?

·         We would also ensure that the Dev team knows exactly what our platform specifics are: O/S version, IIS version, and all the various components, MDAC versions, Framework version etc. Toward the end of the design phase, Test (again the Product Group that is providing the code) will begin to develop the test plans again with in put from Ops.

Test will begin to design test plans during the latter part of this phase, once there is the some sort of a code base to work with.

  1. Build: This is when Test starts to really get involved in testing the code to resolve any functional or performance issues (know as bugs). Since there is now a code base to work from, OPS will perform both an initial application as well as database code reviews. We are primarily looking to ensure that any coding standards that we have developed and provided are being followed. These have evolved over time and continue to evolve as we adopt new technologies. We also look to ensure that monitoring requirements have been addressed.
  2. Stabilize: As the code progresses and starts to get stable, i.e. the bug count begins to decrease, OPS does a final formal Architectural Review to identify any changes that may have resulted from the Dev/Test cycle. By this time we also will have any performance test results. If these perf results are not within our expected tolerances we will report that back to the Product team to get these resolved. Toward the end of the Stabilize phase we work with the Release Management team to review the release plan. The release plan typically is the step by step blueprint for getting the final bits into production. The release plan usually includes everything from which SE or DBA will be doing which step to where the “golden” bits (that will have received formal Test sign off) are located, to what the timing should be for release to Pre-production, Staging and finally Production. The end of Stabilize is marked by formal Test sign off and Ops Sign off of the release plan.
  3. Deploy: This is where the golden bits get propped to the servers. A typical release involves propping the code to a non-live Pre-production environment. This is a representation of the live Production environment with the exception that it is not internet facing, and it does not use the live SQL backend or production Passport, and it typically consists of a single server as opposed to a cluster. After the bits are rolled out to this environment, Test and PM from the Product Group have to sign off on a functional smoke test to make sure that the code functions as planned. This initial release into Pre-prod provides Ops its first chance to actually deploy the bits using the release plan. If changes need to be made we will work with the RM team to ensure that updates are made. After all involved are satisfied that the release to Pre-prod was successful, we then do essentially the same steps for the Staging Environment. Staging is a mirror of Production, but again not internet facing. After this is completed Ops has had the experience of going through the release plan twice and is usually very comfortable with the procedures. The goal of all of this prep work is to do everything possible to ensure that the actual release into production goes a smoothly as possible. Ops also has the responsibility during this phase to ensure that the Operations Guide, a doc that explains the application, what it does, where it lives, including a logical and physical architectural diagram, has been completed. We are also on the hook to ensure that a Trouble Shooting Guide (TSG) is generated. This is a living document that is used by Tier I and Tier II support folks to perform initial incident resolution or escalation should the need arise.

 

There are different SDLC methodologies followed by the various product development groups hosting their services, tools and applications within the MSCOM Ops managed environments. These include Waterfall and Scrum.

 

Waterfall is the classic serial development methodology where downstream deliverables and tasks such as development or testing are dependent on the completion of upstream deliverables and tasks such as a functional spec, code complete and a complete install and configuration guide for test and release management to utilize when deploying into the various environments.

 

Agile is basically multiple waterfall iterations in one development lifecycle and the iterations occur in the design, build and stabilize phases. They both contain the same standard SDLC milestones of Envision, Design, Build, Stabilize, Deploy, Production, Adoption etc…To learn more about these methodologies see http://msdn.microsoft.com/vstudio/teamsystem/msf/. Regardless of the methodology being followed, the requirements for hosting an application or service within the MSCOM Ops managed environments remain the same, as well as the SDLC mandatory controls which ensure Sarbanes Oxley (SOX) compliance. The various Release Management teams supporting the various product development groups that deploy to the MSCOM Ops managed environments have developed checklists a.k.a. Responsible, Accountable, Consult and Inform (RACI) matrix for both the Waterfall and Scrum methodologies to help drive early agreements in the production dev cycle and hold disciplines and individuals accountable for the delivery or task that they have been assigned during the envision phase of the project. (Get these checklists from the Download Center, these are two word docs in a winzip file.) This oversight governance by release management ensures project compliance with all Ops, Release Management and SDLC mandatory control compliance allowing the Program managers to accurately cost and schedule each deliverable and task into the overall project schedule, vastly reducing 11th hour surprises, churn on Operations and potential blocking issues to release.

 

Does every release that we do follow this process? No, not every one, there are and will continue to be exceptions. Does following this process ensure a smooth release and a stable and performant application every time? Again no, but since we started following this process, the amount of churn on Operations has decreased, and has become much more predictable. The amount of out of band releases; hotfixes (which are defined as a single bug fix) and what we call service packs (which are defined as multiple bug fixes bundled into a single deployment package) has decreased as well.

 

MSCOM OPS March Debug Madness…4th Session Q & A Tackling Problems In Dynamic Assemblies

8:28 am - April 4, 2006 in Microsoft.com Operations

Thrusday’s topic was How to Tackle Problems In Dynamically Generated Assembliespresented by Khalil Nassar, a Sr. Systems Engineer on the MSCOM OPS Debugging Team. Here are your questions and Khalil’s answers.

 

  1. Asked: Do these techniques require the program to be a debug build or does it work with release builds too?
    Answered: These techniques work with both. Debug builds will sometimes expose a little more information. example - !clrstack -p will show params passed to methods in a debug build where it may not in retail.

2.       Asked: How to debug either JavaScript or VB Script using Visual Studio.Net IDE?
Answered: This link describes the options for debugging this type of target - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vsdebug/html/vxtskaspscriptdebugging.asp. For me the best way is to load the project into Visual Studio, Goto the Debug menu option and select “Attach to Process…”, then find the instance of iexplore.exe that will run your script.  Once done you can set break points in your script code. Then run the code and debug with the IDE.

Khalil also provided the debug logs that he captured during the webcast. These logs can be accessed via the Microsoft Download Center, they can be opened with Notepad. The zip file includes the following:

1.       ASP Client Side.log

2.       ASP Server Side – Node A.log

3.       Backup4 –Connection Forcibly Closed By Remote Host.log

4.       Backup 4 – Multiple Sockets With Same Port.log

5.       Backup4 – TCP Send Rejected.log

6.       Windows Service – Not Listening On Port 3592.log

7.       Windows Service – Truncated Inserts.log

 

MSCOM OPS March Debug Madness…3rd Session Companion Logs for "Diagnosing Memory Leaks In ASP.Net Apps" Webcast

10:49 am - April 4, 2006 in Microsoft.com Operations

Here is the Download Center link to the  with the web logs that Jim Dobbin captured during his webcast Diagnosing Memory Leaks In ASP.Net Applications. These are great to have locally on hand if you plan to watch the webcast on-demand (see link above for on-demand location).

These are the Console Debugger Logs that capture the action. They are in a zip file that can be opened with any text editor, Notepad for example. Here is what they contain:

WebcastLog1 -- Walks through memory related commands in the debugger and SOS extension

WebcastLog2 --Looks at a few memory related problems

WebcastLog3 -- Captured debugging System.NullReference Exceptions

 

MSCOM OPS March Debug Madness…5th Session Q & A: Debugging Without the Debugger In IIS and ASP.NET

2:44 pm - April 4, 2006 in Microsoft.com Operations

Friday’s topic Debugging Without the Debugger In IIS and ASP.NETpresented by Chris St.Amand, a Sr. Systems Engineer on the MSCOM OPS Debugging Team, proved to be a very popular topic, and a strong ending to MSCOM Ops March Debug Madness webcast series. Chris showed how Event Tracing for Windows (ETW) enables you to extract a wealth of information about the state of IIS and the applications running in its worker processes…all without having to attach a debugger. Our debug had a blast presenting this information. Not only are we glad to be able to share it with you, but we learn a ton of stuff ourselves while putting together these sessions. They are a lot of work…but well worth the effort. Thanks for joining us and don’t forget that you can view these sessions on-demand. Here are the questions and answers generated by this session.

 

  1. Asked: Where can I find IISREQMON?
    Answered: It is available inside of the IISTools.msi package within the Web Hosting best practices zip: http://download.microsoft.com/download/5/a/3/5a318a80-7c2b-460d-afa2-c65635e9de82/WebHosting.zip
  2. Asked: Do all of the ETW commands require Admin logins to run? As a developer I have a user-mode account.
    Answered: Only users with admin privileges, users in the Performance Log Users group, and applications running as LocalSystem, LocalService or NetworkService can control event tracing sessions.
  3. Asked: Why do you have to provide both the GUID and Provider name in the provider file used with logman?
    Answered: It turns out you only need one of them.  Just the provider name enclosed in quotation marks will work (along with your flags and verbosity settings)
    i.e.  “IIS: WWW Server” 0xFFFFFFFF 5
  4. Asked: Is logman.exe shipped with Windows Server 2003, or do I need to download/install something on my production server?
    Answered: Both logman.exe and tracerpt.exe ship with Windows Server 2003.  You can find them in %windir%\system32
  5. Asked: Why do you have so many more providers than I do?  Because you configured them? (with what?)
    Answered: The demos were done on Windows Server 2003 SP1 servers.  There were a significant number of providers/events added into SP1.  If you’re running a down level OS from SP1 you will probably have fewer providers/events.
  6. Asked: Tracing is done to a file.  What is the performance impact of this IO on the real work being done?
    Answered: Tracing does generate quite a bit of disk IO, but I believe it is buffered for performance.  Certainly if your disk is already resource strapped, you would need to be aware of this.  In our environment, the disk overhead of tracing is not impacting.
  7. Asked: Is Logparser getting shipped with Windows Server 2003 or should I need to download and install it separately?
    Answered: It’s available separately on the download center: http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&DisplayLang=en
  8. Asked: How can an ASP.NET application add data to an ETW log?
    Answered:
    This is something that will be possible in the IIS7 timeframe using a new trace listener that you can hook your Trace.Write calls into
  9. Asked: Where can we find more information about these SQL-like statements?
    Answered: www.logparser.com has a repository of some great logparser queries.  You can also check the help file that ships with logparser or run “logparser –h FUNCTIONS”
  10. Asked: Is the collection “IIS Trace” created by default on a Windows Server 2003 machine?  
    Answered: “IIS Trace” was the random name I gave my trace sessions.  The providers and flags for the session you want are all configured in your provider file (supplied to logman using the –pf switch) or directly on your command line using the –p switch.
  11. Asked: I assumed that the workload reports can be generated in XML format for easy parsing?
    Answered: Absolutely.  Tracerpt takes a –f switch that will allow you to output the report in XML, TXT or HTML
  12. Asked: Can we setup a custom event to log in the event viewer?
    Answered: You can certainly log any custom information you want from ASP or ASP.NET.  You can even create your own custom event log to log to.
  13. Asked: Why is it that some gif files have a cache hit percentage of 83% instead of 100% in the workload report?  What does this exactly mean?
    Answered: HTTP.SYS can only cache one content-encoding per response.  On our site we have static compression enabled which means a particular static request could have multiple content-encoding types for its response.  In that situation IIS will favor placing the compressed content into the HTTP.SYS cache, resulting in non-cached responses for requests that don’t use or support the compression encoding.
  14. Asked: Is there a way to review requests that took place before iisreqmon was run without breaking in with a debugger?
    Answered: iisreqmon will show you all requests that are currently “inflight” waiting to complete, even if they arrived on the server prior to running iisreqmon.  If you have requests that are stuck you can run iisreqmon to see what they are without having to worry about having iisreqmon running prior to those requests landing on your server.
  15. Asked: Do ISAPI filters need to explicitly support event tracing in order to get detailed data from the filter when using event tracing?
    Answered: Yes.  You can check it out in the Platform SDK
 

MSCOM OPS March Debug Madness…5th Session Companion Demo Files: Debugging Without the Debugger In IIS and ASP.NET

11:23 am - April 5, 2006 in Microsoft.com Operations

As promised here are the demo batch from Friday March 31 webcast Debugging Without the Debugger In IIS and ASP.NETpresented by Chris St.Amand.

 

Here is the link available to download from the Microsoft Download Center. This download is a winzip file that contains the batch files.

 

All of these are simple notepad files.

 

View From The Top…DO IT, PROVE IT, KNOW IT, SHARE IT

10:41 am - April 11, 2006 in Microsoft.com Operations

This is the second in an ongoing series of blog posts from Todd Weeks, Sr. Director of Operations for the Microsoft.com Operations Team.

 

In my last post, I talked a lot about the “Share It” piece of the work that our team does. How we focus on Innovation and its Impact to your job, and how sharing your accomplishments inside of our team and then external to the company could help broaden its impact even more than we ever could have imagined. It is rewarding to road test your ideas outside the environment they were created. It lets you know how really great they might be, or not, and is an awesome way to learn and grow your knowledge of all the dependencies you may work with.

 

A focus we began a few years ago was around the basic terms that are in this title, Do It, Prove It and Know It. Simply stated, as an engineer on our team you have goals and objectives, it is expected that these get done and that our customers are delighted with our service in managing their sites and applications, that is Do It. But where the rubber really hits the road is with the “Prove It” space. Developers can write awesome code, the platform can be rock solid and sites can be very stable and there could be very little for the Operations person to do. If you are really providing a lot of value back into the development and the support of the platform, then Prove It, measure it. The way we focused here was to prove your Operational MOF maturity. Your site is highly available and reliable, show the documentation for its disaster recovery scenarios, prove everything has a known configuration and is compliant, have you worked back with Development and Program Management to have great Change Management processes. Over the years, we have made sure to be our own worst critic when it comes to making sure we can answer these questions and show Maturity. We have been able to stay well ahead of the curve, and when it was needed we have provided lots of value back into the groups we work with.

 

Lastly, Know It. This represents understanding all of the environmental and business impacts of what you run. A site is more than just availability, it costs money to manage and host. Are you aware of those cost structures, do you have business plans that include finding efficiencies and lowering the costs of running your sites? If engineers are only looking at sites from a performance perspective and not a fiscal one, are they really making the right tradeoffs? We have tried to have everyone understand all of the metrics we strive for both operationally and fiscally. This gives a lot more scope and responsibility to even the lowest leveled engineers on your team and allows them to have an even more holistic view of what it is they are a part of.

 

I am willing to take guidance from anybody that is on my team, no matter what level, because that is where the expertise really lives. So empower everyone to KNOW all of the areas of impact of what they run, and challenge them to be the decision makers. You’ll find it is where most of the Innovative and Great ideas are coming from.

 

User Mode or Kernel Mode…Which Caching Method Did We Use?

9:18 am - April 28, 2006 in Microsoft.com Operations

In response to our article Inside Microsoft.com Moving Microsoft Update Downloads to x64 we had a great question posed to us about why we use user-mode caching instead of kernel-mode caching in IIS6.0 for the download infrastructure for Microsoft Update. 

There are a couple reasons we use user-mode cache versus kernel-mode cache on these servers.  The Microsoft Update client makes byte-range requests when requesting data, so when evaluating which type of caching we wanted, if any, to use with IIS6.0 for these download servers the ability to handle these byte-range requests was a prerequisite.  In looking at kernel-mode caching it was determined that this would not work for our purposes due to the fact that HTTP.sys only sends the whole response and will not attempt to send range responses, thus kernel-mode cache will not cache the responses, not even the file that the byte-range is part of.  This led us to evaluate user-mode caching.  User-mode caching will send a byte-range request; however it will not cache just the byte-range but will cache the entire file.  Thus we can take advantage of user-mode caching but not kernel-mode caching for this application due the need to serve byte-range requests.  For more information you can look at pages 745 of the ‘Internet Information Services (IIS) 6.0 Resource Kit’, ISBN 0-7-7356-1420-2.  It is also freely available online http://www.microsoft.com/downloads/details.aspx?FamilyID=80A1B6E6-829E-49B7-8C02-333D9C148E69&displaylang=en, the section cache settings is located under chapter 7 ‘Web Server Scalability’.

 

Scripting Patch Management of Enterprise Web Clusters on Microsoft.com

3:12 pm - May 1, 2006 in Microsoft.com Operations

One of the most common questions I am asked when meeting with customers is, How does MSCOM patch their Enterprise Web servers? I will cover exactly what our approach is, and give you a little background into some of the challenges we are faced with when patching our production Web servers.  Also, I have included a sample script we currently use on the Microsoft.com Web team to patch our production Web servers. 

 

To give you an overview of our environment, Microsoft.com is comprised of over 120 Web properties hosted on over 1000 Web servers.   Our Web team supports sites that range from our corporate business presence of www.microsoft.com, to Developer/ITPro sites such as MSDN, and TechNet, including Download distribution sites such as Windows Update, and Download.microsoft.com.  To maintain a high level of availability, each of our sites is comprised of multiple web clusters in multiple datacenters each running NLB.   As I mentioned high availability is extremely important to our customers, and is invaluable in showcasing the Microsoft products and services we utilize here on Microsoft.com.  Unfortunately, we are faced with the same challenges of having to apply hotfixes and service packs while making sure there are no service interruptions. This is a tough challenge.  For example, on the Download.microsoft.com site it can take up to six hours to drain all the active connection from each server.   We have architected each of these sites with added server capacity to ensure we can handle peak capacity and also handle any unplanned or planned outages such as a patch event. 

 

Whether we are configuring new Web servers, maintaining configuration control or in this case patching our servers, we leverage admin scripting.  Every engineer should have a working knowledge of the basics of a scripting language. This skill set is invaluable in managing enterprise Web servers.  Scripting our deployments allows us the flexibility to perform controlled patching.  The advantages of scripting your patch deployments are zero service interruption, and the ability to have coordinated customer deployments.  For the advantages listed above we leverage the use of admin scripting during our deployments.

 

Listed below is a sample vbscript we have recently used to patch our Web servers. Copy the script below and save it as SrvPatch.vbs. You will need to create a text file named ServerList.txt containing all the servers you want patched.  To execute the script from the command line type “SrvPatch.vbs ServerList.txt”.  You will then be prompted to type in your password.  The basic flow is of this script is to drain the live connections, run the appropriate patch and associated switches, place the server back into rotation, and loop thru the next server in your ServerList file.

 

‘ Script example: (Save the below script as SrvPatch.vbs)

‘ Script requires the SysInternals tool PSEXEC.  http://www.sysinternals.com/Utilities/PsExec.html

‘ Syntax to execute script “SrvPatch.vbs ServerList.txt”

Dim oFSO

Set oFSO = CreateObject("Scripting.FileSystemObject")

Dim oFile

Dim sServer

Dim sPass

dim sSystemRoot

Set WSHShell = WScript.CreateObject("WScript.Shell")

 

If Wscript.arguments.count > 1 Then

                Wscript.Echo "Syntax:  SrvPatch.vbs <Server List> [Password]"

                WScript.Echo "Example: SrvPatch.vbs ServerList.txt Password"

                WScript.Quit (0)

END If

 

set oFile = oFSO.OpenTextFile(WScript.arguments(0))

 

'sPass = WScript.arguments(1)

Wscript.Echo "Please enter password:"

sPass = wscript.stdin.readline

 

sSystemRoot = wshShell.ExpandEnvironmentStrings("%systemroot%")

 

Do while oFile.AtEndOfStream =false

                sServer = oFile.ReadLine

                GetInfo sServer, CountConnections

 

              Draining traffic from server         

                Set BeginDrain = CreateObject("wscript.Shell")

                BeginDrain.Run "psexec \\" & sServer & " wlbs drain all",0,"true"

 

                Do While CountConnections > 20

                                GetInfo sServer, CountConnections

                                wscript.echo sServer & ": " & CountConnections

                                wscript.echo "Sleeping..Draining.."

                                WScript.Sleep 10000

                Loop

                                Set WLBSSuspend = CreateObject("wscript.Shell")

                                WLBSSuspend.run "psexec \\" & sServer & " wlbs suspend",0,"true"

 

                                wscript.echo sServer & ": Drained..WLBS Suspend..Patching to Begin"

 

                                Launching your patch or executable file          

Set LaunchPatch = CreateObject("wscript.Shell")

                                LaunchPatch.run sSystemRoot & "\system32\cmd.exe /c echo " &  sPass & "| \\Server\share\patchfile.exe", 1,"true"

 

                                wscript.echo sServer & ": Patched......."

 

                                WScript.Sleep 10000

 

                               Adding server back into rotation                          

                                Set WLBSResumeIntoRotation = CreateObject("wscript.Shell")

                                WLBSResumeIntoRotation.run "psexec \\" & sServer & " wlbs Resume",0,"true"

                                wscript.echo sServer & ": Resumed"

                               

                                Set WLBSStartIntoRotation = CreateObject("wscript.Shell")

                                WLBSStartIntoRotation.run"psexec \\" & sServer & " wlbs Start",0,"true"

                                wscript.echo sServer & ": Started and taking traffic"

 

                                wscript.echo "------------------------------------"

               

Loop

oFile.Close

 

Function GetInfo(Computer, CountConnections)

                REM On Error Resume Next

                strComputer = Computer

                Set objWMIService = GetObject("winmgmts:\\" & strComputer & "\root\cimv2")

                Set colItems = objWMIService.ExecQuery("Select * from Win32_PerfFormattedData_Tcpip_TCPV4",,48)

 

               

                                For Each objItem in colItems

                                                CountConnections = objItem.ConnectionsEstablished

                                               

                                Next

                                                               

rem wscript.echo "Connections at: " & CountConnections

End Function

 

What-Ya-Got-Too-Much-of Stew

3:15 pm - May 8, 2006 in Microsoft.com Operations

Many stories of the popular author Patrick McManus, contain references to a mysterious concoction, “Whatchagot Stew”.  The recipe for this stew is summarized as being whatever people have on hand, tossed together and boiled for some period of time.  The name being derived from how you start cooking it, someone askes “What’s for dinner?” and the reply is: “I don’t know, what-cha-got?”.  It is an unfortunate fact that many websites in the world (including Microsoft.com) were and are produced in a similar manner.  Of course, this was never the goal started out with.  No, people always have lofty goals of pristine sites that stay perfectly up to date because of their ease of moving customers on to the cool new stuff.  The reality is that people don’t move and content with a plan for end-of-life is an impossible dream for most web sites.  While the approach of “letting sites grow by only adding and never removing” simplifies many of the design and implementation requirements, it also tends to produce a system that is nigh impossible to keep running.  These systems usually prove to be difficult to understand, debug, and improve from an “abilities” point of view (reliability, availability, performancability… ok that is a bit of a stretch, but you get the idea).

 

Though system administrators everywhere should be pushing for engineering excellence in the content exposed to customers (ie. end-of-life plans for all content), we need to realize that sometimes you just have to eat what is in front of you. Since the website stew as a whole is nearly impossible to handle, the real question is how do you break the content up into easily digestible (debug-able) pieces?  One of the rules of thumb to rely on is: “Worst is First”.  Meaning you need to determine what has the worst impact on your system, fix that and then find the new worst thing (note, sometimes the same thing is still worst and needs to be worked on again). 

  

So, what to do first? 

 

One of the easiest ways to start an investigation on the impact of pages to the server is to use Event Tracing for Windows (ETW).  If you don’t know much about it, then you should go watch the great presentation Chris St. Amand did during the Debug Technet Week.  He goes into much more depth about how to use ETW, as well as giving other uses.  

 

For the use related to the topic of performance analysis, there are just a few easy steps and happily they were covered in Chris’ presentation (click here for the .zip file).  We start by creating a file to contain the definitions for what we want to trace.  Chris named it iistrace.guid and the content of the file is:

 

     {1fbecc45-c060-4e7c-8a0e-0dbd6116181b} 0xFFFFFFFF 5 IIS: SSL Filter

     {3a2a4e84-4c21-4981-ae10-3fda0d9b0f83} 0xFFFFFFFE 5 IIS: WWW Server

     {06b94d9a-b15e-456e-a4ef-37c984a2cb4b} 0xFFFFFFFF 5 IIS: Active Server Pages (ASP)

     {dd5ef90a-6398-47a4-ad34-4dcecdef795f} 0xFFFFFFFF 5 HTTP Service Trace

     {a1c2040e-8840-4c31-ba11-9871031a19ea} 0xFFFFFFFF 5 IIS: WWW ISAPI Extension

     {AFF081FE-0247-4275-9C4E-021F3DC1DA35} 0xFFFFFFFF 5 ASP.NET Events

We then use startlogiis.bat which is a simple wrapper around logman – the system tool that manages tracing as well as automated performance counter collection.  The contents of startlogiis.bat are:

logman start "NT Kernel Logger" -p "Windows Kernel Trace" (process,thread,disk) -ct perf -o krnl.etl -ets

logman start "IIS Trace" -pf iistrace.guid -ct perf -o iis.etl –ets

 

While we are only investigating IIS related pages, in this case we enable the kernel tracing as well so that we can compare system resource usage which is stored in the kernel trace.

 

After letting the server take load for a reasonable amount of time (a few minutes to a few hours – depending on the size of files you want to deal with), we need to stop tracing.  Again Chris gave us a nice package for doing so in stoplogiis.bat:

 

logman stop "IIS Trace" -ets

logman stop "NT Kernel Logger" -ets

 

Now that we have produced the .etl files, we can go ahead and produce a pretty view of the current server performance using the logman command:

 

            tracerpt iis.etl krnl.etl -o output.csv –report report.htm -summary -f html

 

This is the first place where I didn’t just use the good stuff provided by Chris.  In his example, he used the text output format, which I find a little harder to read than the pretty HTML format.  The preference is really up to you.  If you want to see the text format, simply leave off the last parameter and give a different parameter (or no parameter) to –report.  Note, workload.bat does this for you.  If you want to automate this process, then the XML output might be what you are looking for. 

 

It should also be noted that we have been working only with tools found in the box.  If you want to make analysis even easier, you can get System Performance Advisor which will turn this into button clicks.

 

Stir the Pot

 

Now that we have an HTML file with our performance data, it is time to stir the pot and get the stew bubbling.  By this I mean we need to see which pages are more costly.  It is actually pretty simple, just scroll down till you see the second titled “URLs with the Most CPU Usage”.  You should see the full URL and be able to go talk to the owner and ask for an improvement.

 

You also may want to take special note of URLs that come up high in the Most CPU Usage section, but don’t appear to high in the “Most Requested URLs” section.  These are good candidates for some quick performance fixes that will impact your site.  Also of interest would be the Slowest URLs and the Most Bytes Sent - though these fall more into impacting the spice of the Performance Stew – the end user perception of performance - which will be the topic of our next performance blog entry.

  

Check the Ingredients

 

One of the dangers of using ETW analysis on the live site is the fact that some of the pages that impact performance the worst might not be hit during the trace.  Or they might not be used in the way that makes them use resources the most.  One way to help this data be the best it can be is to use ETW tracing in the testing environment while under known load.  That way you can focus on different URLs or applications rather than on the site as a whole.  It does mean setting up stress and test work, but it could be well worth your while if you can squeeze a few percent points off of your CPU usage number.  As an example, we recently noticed that we had some heavily used pages that were wasting about 6% of our CPU handling something they shouldn’t be handling.  One configuration change later (not even a code change) and we see the CPU usage average drop by 5-6 points.  Very nice to have that CPU back for the next wave of applications.

 

Also, if you use the approach I outlined above, you only get the top ten offenders for each of the different categories.  For example you only get the top ten CPU users.  However, if you use SPA, you can get a much larger data set (up to 100 I believe).  This is important, since what you want to do is look for pages using a lot of CPU, but aren’t necessarily on the most hit list.  

  

Go ahead, taste some

 

Unlike true Whatchagot Stew, you shouldn’t gag trying to use these tools.  In fact the work should go down nice and smooth.  And with a little practice, you should be able to take little bite sized pieces improvements in the performance of your site, and on a regular basis.

 
 
 
 
 
 
It's All About Search | © clsc.net |
2012.02.0420:06
Tech used here: Valid HTML - Valid CSS - Valid RSS - JavaScript - PHP - Smarty - MySQL - and a partridge in a pear tree.