Well, today was my turn to take the ALS Ice Bucket Challenge. This morning I was tagged by both Adam Cogan and Scott Guthrie. Tempting as it is though, I’m not doing it twice I don’t know if it’s typical, but Adam challenged me to complete the task within 24 hours. I spent today thinking about how I would get home early enough, how I would orchestrate it and where I would get the ice.
I decided to do it on the farm (notice the cows behind me) and to have my kids help me. When I first told them that I needed their help, they moaned in exasperation at having to “help dad again”. When I told them, they would be pouring ice cold water on me while filming it, they jumped up and down screaming with excitement.
I gave them instructions ahead of time on what to do. Apparently I wasn’t clear enough. You’ll notice that my daughter poured the water on me very gradually. I was hoping for an instantaneous drenching but instead I got a slow motion frostbite. It seemed like it lasted 10 minutes
I’ve now named James Philips, David Treadwell and Buck Hodges as my victims, ahem, I mean nominees in the challenge. Good luck to you all – it’s for a good cause. Of course, you could just donate $100 and chicken out. Or you could drench yourself and donate $100 anyway. Up to you.
Yesterday I rolled out the release notes for our sprint 69 deployment. Check out the updates!
P.S. the outage we had last Thursday was *not* caused by the rollout of updates (or, at best only tangentially so). I’m hoping to get the outage retrospective written up tomorrow.
Today we released CTP of our next major Visual Studio release. You can look in the release notes to see what is in it. You can also read about the highlights on John’s post. There’s no startling new improvements in this CTP but lots of nice, smaller, things.
Just a reminder, we aren’t shipping CTP previews of TFS “14” at this point. Visual Studio Online is still your best way to check out the latest TFS improvements.
Let me start by apologizing for the pretty horrific outage we had last week. I’ve been silent on it because I was on a “last family vacation” in Europe before my oldest son goes off to college. Buck Hodges and others have been working hard on it. I’ve been reading up on everything that happened and everything that’s been done. I need to spend some time talking with the team but I expect to publish a lengthy retrospective in the next few days. Stay tuned for more.
At this point, I think enough has been done that we won’t see a recurrence of the issues any time soon. However, there is some underlying work that will take some time (small number of months I expect) to put in place the infrastructure necessary to avoid another incident in the same class.
Again, I’m very sorry for the disruption. We take all incidents very seriously and work hard to ensure they won’t ever happen again.
Today we released the final version of Visual Studio 2013 Update 3 and Team Foundation Server Update 3. You can get the update using the link below. Note that the link includes both the Visual Studio & TFS downloads (among other things) if you expand the Details section on the page.
I’ve blogged about the features before but I’ll reiterate that some of the biggest enhancements in this Update include:
- CodeLens support for Git
- Configurable display on in progress items on the backlog (a common customer request)
- Application Insights tooling
- Desktop app support in the memory usage tool (including WPF)
- Release management support for PowerShell/DSC and Chef
- Test plan/suite customization, permissions, auditing, etc.
- Cloud load testing integration with Application Insights for app under test telemetry/diagnostics
- and a substantial number of bug fixes (listed in the KB article).
Along side Update 3, we are also releasing an updated CTP of our Cordova tooling with support for Windows 7. Make sure to look for that too.
Thanks for all of your help with validating early drops of this release and we hope you like it. We’re happy to be delivering it and already turning our attention to Update 4. I’m hoping we’re going to see several very nice improvements to the TFS Agile planning tools in that release plus a lot more. Stay tuned. I suspect we’ll ship the first CTP of Update 4 in a couple of months.
I was out doing chores on my farm yesterday morning and ran across something surprising (to me, at least). By the pig pen, we have an electric fence charger and it is covered by buckets. For some reason I don’t understand (my wife did it), there are two buckets – one nested inside the other. Yesterday, I removed the top bucket and inside it, I found a frog. It took me a minute to recognize it as a frog because it was so white it didn’t look much like a frog.
I know frogs can change colors to match their surroundings but white? I’ve seen documentaries of incredible color changing animals but those are exotic animals in some exotic place – not a frog in my back yard, right?
10 points to anyone who can identify what kind of frog (or maybe toad for all I know) it is
It was pretty cool.
Sorry it took me a week and a half to get to this.
We had the most significant VS Online outage we’ve had in a while on Friday July 18th. The entire service was unavailable for about 90 minutes. Fortunately it happened during non-peak hours so the number of affected customers was fewer than it might have been but I know that’s small consolation to those who were affected.
My main goal from any outage that we have is to learn from it. With that learning, I want to make our service better and also share it so, maybe, other people can avoid similar errors.
The root cause was that a single database in SQL Azure became very slow. I actually don’t know why, so I guess it’s not really the root cause but, for my purposes, it’s close enough. I trust the SQL Azure team chased that part of the root cause – certainly did loop them in on the incident. Databases will, from time to time, get slow and SQL Azure has been pretty good about that over the past year or so.
The scenario was that Visual Studio (the IDE) was calling our “Shared Platform Services” (a common service instance managing things like identity, user profiles, licensing, etc.) to establish a connection to get notified about updates to roaming settings. The Shared Platform Services were calling Azure Service Bus and it was calling the ailing SQL Azure database.
The slow Azure database caused calls to the Shard Platform Services (SPS) to pile up until all threads in the SPS thread pool were consumed, at which point, all calls to TFS eventually got blocked due to dependencies on SPS. The ultimate result was VS Online being down until we manually disabled our connection to Azure Service Bus an the log jam cleared itself up.
There was a lot to learn from this. Some of it I already knew, some I hadn’t thought about but, regardless of which category it was in, it was a damn interesting/enlightening failure.
**UPDATE** Within the first 10 minutes I've been pinged by a couple of people on my team pointing out that people may interpret this as saying the root cause was Azure DB. Actually, the point of my post is that it doesn't matter what the root cause was. Transient failures will happen in a complex service. The interesting thing is that you react to them appropriately. So regardless of what the trigger was, the "root cause" of the outage was that we did not handle a transient failure in a secondary service properly and allowed it to cascade into a total service outage. I'm also told that I may be wrong about what happened in SB/Azure DB. I try to stay away from saying too much about what happens in other services because it's a dangerous thing to do from afar. I'm not going to take the time to go double check and correct any error because, again, it's not relevant to the discussion. The post isn't about the trigger. The post is about how we reacted to the trigger and what we are going to do to handle such situations better in the future.
Don’t let a ‘nice to have’ feature take down your mission critical ones
I’d say the first and foremost lesson is “Don’t let a ‘nice to have’ feature take down your mission critical ones.” There’s a notion in services that all services should be loosely coupled and failure tolerant. One service going down should not cause a cascading failure, causing other services to fail but rather only the portion of functionality that absolutely depends on the failing component is unavailable. Services like Google and Bing are great at this. They are composed of dozens or hundreds of services and any single service might be down and you never even notice because most of the experience looks like it always does.
The crime of this particular case is that, the feature that was experiencing the failure was Visual Studio settings roaming. If we had properly contained the failure, your roaming settings wouldn’t have synchronized for 90 minutes and everything else would have been fine. No big deal. Instead, the whole service went down.
In our case, all of our services were written to handle failures in other services but, because the failure ultimately resulted in thread pool exhaustion in a critical service, and it reached the point that no service could make forward progress.
Smaller services are better
Part of the problem here was that a very critical service like our authentication service shared an exhaustible resource (the thread pool) with a very non-critical service (the roaming settings service). Another principle of services is that they should be factored into small atomic units of work if at all possible. Those units should be run with as few common failure points as possible and all interactions should honor “defensive programming” practices. If our authentication service goes down, then our service goes down. But the roaming settings service should never take the service down. We’ve been on a journey for the past 18 months or so of gradually refactoring VS Online into a set of loosely coupled services. In fact, only about a year ago, what is now SPS was factored out of TFS into a separate service. All told, we have about 15 or so independent services today. Clearly, we need more :)
How many times do you have to retry?
Another one of the long standing rules in services is that transient failures are “normal”. Every service consuming another service has to be tolerant of dropped packets, transient delays, flow control backpressure, etc. The primary technique is to retry when a service you are calling fails. That’s all well and good. The interesting thing we ran into here was a set of cascading retries. Our situation was
Visual Studio –> SPS –> Service Bus –> Azure DB
When Azure DB failed Service Bus retried 3 times. When Service Bus failed, SPS retried 2 times. When SPS failed, VS retried 3 times. 3 * 2 * 3 = 18 times. So, every single Visual Studio client launched in that time period caused a total of 18 attempts on the SQL Azure database. Since the problem was that the database was running slow (resulting in a timeout after like 30 seconds), that’s 18 tries * 30 seconds = 9 minutes each.
Calls in this stack of services piled up and up and up until, eventually, the thread pool was full and no further requests could be processed.
As it turns out SQL Azure is actually very good about communicating to it’s callers whether or not a retry is worth attempting. SB doesn’t pay attention to that and doesn’t communicate it to it’s callers. And neither does SPS. So a new rule I learned is that it’s important that any service carefully determine, based on the error, whether or not retries are called for *and* communicate back to their callers whether or not retries are advisable. If this had been done, each connection would have been only 30 seconds rather than 9 minutes and likely the situation would have been MUCH better.
A traffic cop goes a long way
Imagine that SPS kept count of how many concurrent calls were in progress to Service Bus. Knowing that this service was a “low priority” service and that calls were synchronous and the thread pool limited, it could have decided that, once that concurrent number of calls exceeded some threshold (let’s say 30, for arguments sake) that it would start rejecting all subsequent calls until the traffic jam drained a bit. Some callers would very quickly get rejected and their settings wouldn’t be roamed but we’d never have exhausted threads and the higher priority services would have continued to run just fine. Assuming the client is set to attempt a reconnect on some very infrequent interval, the system would eventually self-heal, assuming the underlying database issue was cleared up.
Threads, threads and more threads
I’m sure I won’t get out of this without someone pointing at that one of the root causes here is that the inter-service calls were synchronous. They should have been asynchronous, therefore not consuming a thread and never exhausting the thread pool. It’s a fair point but not my highest priority take away here. You are almost always consuming some resource, even on async calls – usually memory. That resource may be large but it too is not inexhaustible. The techniques I’ve listed above are valuable, regardless of sync or async and will also prevent other side effects, like pounding an already ailing database into the dirt with excessive retries.
So, it’s a good point, but I don’t think it’s a silver bullet.
So, onto our backlog go another series of “infrastructure" improvements and practices that will help us provide an ever more reliable service. All software will fail eventually, somehow. The key thing is to examine each and every failure, trace the failure all the way to the root cause, generalize the lessons and build defenses for the future.
I’m sorry for the interruption we caused. I can’t promise it won’t happen again, *but* after a few more weeks (for us to implement some of these defenses), it won’t happen again for these reasons.
Thanks as always for joining us on this journey and being astonishingly understanding as we learn, And, hopefully these lessons provide some value to you in your own development efforts.
A month ago I wrote about our newly enabled capability to measure quality of service on a customer by customer basis. In that post I mentioned that we had actually identified a customer experiencing issues before they even contacted us about them and had started working with them to understand the issues. Well, the rest of that story…
We’ve identified the underlying issue. The customer had an unusually large number of Team Projects in their account and some of our code paths were not scaling well, resulting in slower than expected response times. We have debugged it, coded a fix and will be deploying it with our next sprint deployment.
Now that’s cool. We’ve already started working with a few other of the accounts that have the lowest quality of service metrics. Our plan is to make this a regular part of our sprint rhythm where, every sprint, we investigate a top few customer accounts on the list and try to deploy fixes within a sprint or two – improving the service every sprint.
Today we began deployment of our sprint 68 work. There’s a bunch of really good stuff there. I say “begun” because deployment is a multi day event now as we roll it out across instances. Everyone should have the updates by tomorrow (Tue) afternoon. You can read the release notes to get details.
You’ll see that one part of the licensing changes I described a couple of weeks ago are now live – addition of Test Hub access to the Visual Studio Online Advanced license. The remaining stakeholder licensing changes are still tracking to go live in mid-August. Stay tuned for more.
Azure Active Directory support
The biggest thing in the announcement is the next step in our rollout of Azure Active Directory (AAD) support in VS Online. We started this journey back in April with the very first flicker of AAD support at the Build conference. We added more support at TechEd but I’ve stayed pretty quiet about it because, until this week, there was no way to convert and existing account to AAD. With this deployment we’ve enabled it. Officially it’s in preview and you have to ask to get access to do it but we’re accepting all requests so it’s nothing more than a speed bump to keep too big a rush from happening all at once. With these last set of changes, you can:
- Associate your OrgID (AAD/AD credentials) with your MSDN subscription, if you have one, and use that to grant your VSO license
- Create a new account connected to an AAD tenant
- Connect an existing account to an AAD tenant
- Disconnect an account from an AAD tenant
- Log in with either a Microsoft Account or and OrgID (AAD only or synchronized from you on prem Active Directory) giving you single sign-on with your corporate credentials, Office 365, etc.
- I’m probably forgetting something but you get the point
I encourage you to read the docs and more docs for details. One thing I’ve asked be included in the docs and I’m still not satisfied with the clarity is one detail about binding an existing account to AAD. If you have an existing account not connected to AAD then, by definition, you are using Microsoft Accounts. When you connect you VS Online account to AAD, your identities have to be recognized by AAD to authenticate. You have 3 options for each existing user of your account:
- Add the Microsoft Account as an “external identity” in your AAD. All your data and in-progress work carries forward. The draw back is that external Microsoft accounts won’t fully honor you AAD policies – like Two Factor Auth, Password policies, etc. It’s still a Microsoft Account that’s been associated with your AAD, giving your AAD admin central control over access.
- If you created your Microsoft Account using the same email address as your AD/AAD identity (for instance, for me it’s firstname.lastname@example.org) then, when you connect your VSO account to AAD, your Microsoft Account will be seamlessly rebound to your corporate identity. All your data and in progress work carries forward and your login get the full set of AAD governance. This is the “best” of the 3 options but requires that you created your Microsoft Account a certain way.
- If you can’t do #2 and you don’t want to do #1, then you can just add your AAD identity as a “new” VS Online user and remove your old Microsoft Account identity from the VS Online account. To VS Online this is just like adding a new user and deleting an old user. VS Online has no idea they are the same person. This has the advantage of getting full AAD administration but the downside that in-progress work (checkouts, work items assigned to you, …) and other places where your old MS Account identity was associated need to either be deleted or reassigned to your new identity. Work items can be reassigned. Workspaces, shelvesets and stuff like that can be deleted. History will always be associated with your “old” Microsoft Account identity.
So that’s a good segue to what’s left for us to do to really complete AAD support…
- Add the ability to migrate one identity to any other identity, thereby having all references in VSO changed to the new user (to get around the issue in #3). This is on the backlog but is going to take a while.
- Add support for using AAD groups (to assign permissions, query work items, etc) in VS Online. Today you can use AAD users, but you can’t yet AAD groups. This feature is coming fairly soon (within the next few sprints).
I’m sure I’m missing something else we haven’t done yet but I don’t think anything big. AAD support is ready for prime time for most user scenarios.
And I have to say something about account deletion. Until this week, VS Online account deletions could only be done by contacting support – and we had to do a delicate dance to ensure that the person requesting a deletion had the rights to. For the past few months, account deletion has been the #1 support request, with dozens of requests a month. There are all kinds of reasons –
- Merging multiple accounts into one
- Moving from VS Online back to on-premises TFS
- Wanting to just wipe everything out and start over (for instance after an evaluation)
With this week’s deployment, account deletion is self service (assuming you are an account administrator). However, it’s important to understand that all account deletes are “soft” deletes only. Meaning the account is “marked for deletion” and no one can access it any more but it is *not* actually deleted. It will be physically deleted, I believe, 90 days after you delete it in the UI. This gives you a window to have your “Oh sh%t!” moment. If you realize that you deleted something you did not intend to, you can contact support and they can “undelete” your account. This is indicative of a general direction we are headed where all deletes are “soft” and you always have a time window to go back and recover it. It will take us quite a while to get there on everything that can be deleted but we’ll make progress every chance we get. Of course, if there’s some reason you *REALLY* need a VS Online account permanently deleted immediately, you can contact support to help you.
Oh, and lest I manage to avoid mentioning any feature in this deployment, check out the new trend reports. They are very cool and make the VS online charting experience even more useful. And, because I know several people will ask, yes, these charting enhancements will be added Team Foundation Server (our on-premises product). If everything goes according to plan, they will be in TFS 2013.4 (Update 4) later this fall.
It’s a bunch of stuff. Maybe you have to be a bit of a geek to appreciate all of it We’ve been working on some of this for a good while and I’m really happy to see it all available. Check it out and let us know what you think.
Through the fall and spring, we transitioned VS Online from Preview to General Availability. That process included changes to branding, the SLA, the announcement of pricing, the end of the early adopter program and more. We’ve been working closely with customers to understand where the friction is and what we can do to make adopting VS Online as easy as possible. This is a continuing process and includes discussions about product functionality, compliance and privacy, pricing and licensing, etc. This is a journey and we’ll keep taking feedback and adjusting.
Today I want to talk about one set of adjustments that we want to make to licensing.
As we ended the early adopter period, we got a lot of questions from customers about how to apply the licensing to their situation. We also watched as people assigned licenses to their users: What kind of licenses did they choose? How many people did they choose to remove from their account? Etc.
From all of this learning, we’ve decided to roll out 2 licensing changes in the next couple of months:
A common question we saw was “What do I do with all of the stakeholders in my organization?” While the early adopter program was in effect and all users were free, customers were liberal with adding people to their account. People who just wanted to track progress or file a bug or a suggestion occasionally, were included. As the early adopter period ended, customers had to decide – Is this really worth $20/user/month (minus appropriate Azure discounts)? The result was that many of these “stakeholders” were removed from the VS Online accounts in the transition, just adding more friction for the development teams.
As a result of all this feedback we proposed a new “Stakeholder” license for VS Online. Based on the scenarios we wanted to address, we designed a set of features that matched the needs most customers have. These include:
- Full read/write/create on all work items
- Create, run and save (to “My Queries”) work item queries
- View project and team home pages
- Access to the backlog, including add and update (but no ability to reprioritize the work)
- Ability to receive work item alerts
Some of the explicitly excluded items are:
- No access to Code, Build or Test hubs.
- No access to Team Rooms
- No access to any administrative functionality (Team membership, license administration, permissions, area/iterations configuration, sprint configuration, home page configuration, creation of shared queries, etc.)
We the surveyed our “Top Customers” and tuned the list of features (to arrive at what I listed above). One of the conversations we had with them was about the price/value of this feature set. We tested 3 different price points - $5/user/month, $2/user/month and free. Many thought it was worth $5. Every single one thought it was worth $2. However, one of the questions we asked was “How many stakeholders would you add to your account at each of these price points?” The result was 3X more stakeholders if it’s free than if it’s $2. That told us that any amount of money, even if it is perceived as “worth it”, is too much friction. Our goal is to enable everyone who has a stake to participate in the development process (and, of course, to run a business in the process). Ultimately, in balancing the goals of enabling everyone to participate and running a business, we concluded that “free” is the right answer.
As a result, any VS Online account will be able to have an unlimited number of “Stakeholder” users with access to the functionality listed above, at no charge.
Access to the Test Hub
Another point of friction that emerged in the transition was access to the Test hub. During the Preview, all users had access to the Test hub but, at the end of the early adopter program, the only way to get access to the Test hub was by purchasing Visual Studio Test Professional with MSDN (or one of the other products that include it, like VS Premium or VS Ultimate).
We got ample feedback that there were a class of users who really only need access to the web based Test functionality and don’t need all that’s in VS Test Professional.
Because of this, we’ve decided to include access to all of the Test hub functionality in the Visual Studio Online Advanced plan.
I’m letting you know now so that, if you are currently planning your future, you know what is coming. I’m always loathe to get too specific about dates in the future because, as we all know, stuff happens. However, we are working hard to implement these licensing changes now and my expectation is that we’ve got about 2 sprints of work to do to get it all finished. That would put the effective date somewhere in the neighborhood of mid-August. I’ll update you with more certainty as the date gets a little closer.
What about Team Foundation Server?
In general, our goal is to keep the licensing for VS Online and Team Foundation Server as “parallel” as we can – to limit how confusing it could be. As a result, we will be evolving the current “Work Item Web Access” TFS CAL exemption (currently known as “Limited” users in TFS) to match the “Stakeholder” capabilities. That will result in significantly more functionality available to TFS users without CALs. My hope is to get that change made for Team Foundation Server 2013 Update 4. It’s too early yet to be sure that’s going to be possible but I’m hopeful. We do not, currently, plan to provide an alternate license for the Test Hub functionality in TFS, though it’s certainly something we’re looking at and may have a solution in a future TFS version.
As I said, it’s a journey and we’ll keep listening. It was interesting to me to watch the phenomenon of the transition from Preview to GA. Despite announcing the planned pricing many months in advance, the feedback didn’t get really intense until, literally, the week before the end of the early adopter period when everyone had to finish choosing licenses.
One of the things that I’m proud of is that we were able to absorb that feedback, create a plan, review it with enough people, create an engineering plan and (assuming our timelines hold), deliver it in about 3 months. In years past that kind of change would take a year or two.
Hopefully you’ll find this change valuable. We’ll keep listening to feedback and tuning our offering to create the best, most friction-free solution that we can.
I’m not going to make too big a deal about this because there’s going to be tons of them between now and when VS “14” ships. But we shipped another CTP today and you can learn more about it here: http://blogs.msdn.com/b/visualstudio/archive/2014/07/08/visual-studio-14-ctp-2-available.aspx
We’re continuing the practice of making Azure VM templates available to make it really easy to try out the CTPs.
We are starting to show some nice new features that are worth learning about. I think the lightbulb feature is promising, for instance.
For reasons I explained in my last post on the subject, we are not releasing TFS “14” CTPs at this time and, quite honestly, won’t for a while. We will start releasing CTPs of TFS well before the release but there’s just not a good enough cost benefit analysis to it right now. You can see the majority of the work we are doing on VS Online as we do it.
Years ago, I used to do monthly updates on TFS adoption at Microsoft. Eventually, the numbers got so astronomical that it just seemed silly so I stopped doing them. It’s been long enough and there’s some changes happening that I figured it was worth updating you all on where we are.
First of all, adoption has continued to grow steadily year over year. We’ve continued to onboard more teams and to deepen the feature set teams are using. Any major change in the ALM solution of an organization of our size and complexity is journey.
Let’s start with some stats:
As of today, we have 68 TFS “instances”. Instance sizes vary from modest hardware up to very large scaled out hardware for the larger teams. We have over 60K monthly active users and that number is still growing rapidly. Growth varies month to month and the growth below seems unusually high (over 10%). I grabbed the latest data I could get my hands on – and that happened to be from April. The numbers are really staggeringly large.
|Current||30 day growth|
In addition we’ve started to make progress recently with Windows and Office – two of the Microsoft teams with the oldest and most entrenched engineering systems. They’ve both used TFS in the past for work planning but recently Windows has also adopted TFS for all work management (including bugs) and Office is planning a move. We’re also working with them on plans to move their source code over.
In the first couple of years of adoption of TFS at Microsoft, I remember a lot of fire drills. Bringing on so many people and so much data with such mission critical needs really pushed the system and we spent a lot of time chasing down performance (and occasionally availability) problems. These days things run pretty smoothly. The system is scaled out enough and the code, and our dev processes have been tuned enough, that for the most part, the system just works. We upgrade it pretty regularly (a couple of times a year for the breadth of the service, as often as every 3 weeks for our own instances).
As we close in on completing the first leg of our journey – getting all teams at Microsoft onto TFS, we are now beginning the second. A few months ago, The TFS team and a few engineering systems teams working closely with them moved all of their assets into VS Online – code, work items, builds, etc. This is a big step and, I think, foreshadows the future for the entire company. At this point it’s only a few hundred people accessing it but it’s already the largest and most active account on VS Online and it will continue to grow.
It was a big decision for us – and we went through a lot of the same anxieties I hear from anyone wanting to adopt a cloud solution for a mission critical need. Will be intellectual property be safe? What happens when the service goes down? Will I lose any data? Will performance be good? Etc. Etc. At the same time, it was important to us to live the life that we are suggesting our customers live – taking the same risks and working to ensure that all of those risks are mitigated.
The benefits of moving are already visible. I’ve had countless people remark to me how much they’ve enjoyed having access to their work – work items, build status, code reviews, etc from any device, anywhere. No messing with remote desktop or any other connectivity technology. As part of this, we also bound the account to the Microsoft Active Directory tenant so we can log in using the same corporate credentials as we do for everything else. Combining this with a move to Office 365/SharePoint Online for our other collaboration workflows has created for us a fantastic mobile, cloud experience.
I’ll see about starting to post some statistics on our move to the cloud. As, I say, at this point, it’s a few hundred people and mostly just the TFS codebase – which is pretty large at this point. Over time that will grow but I expect it will be slow – getting larger year over year into a distant future when all of Microsoft has moved to the cloud for our engineering system tools.
I know I have to say this because people will ask. No, we are not abandoning on-prem TFS. The vast majority of our customers still use it, the overwhelming majority of our internal teams still use it (the few hundred people using VS Online is still rounding error on the more than 60K people using TFS on premises. We continue to share a codebase between VS Online and TFS and the vast majority of the work we do accrues to both scenarios – and that will continue to be the case. TFS is here to stay and we’ll keep using it ourselves for a very long time. At the same time VS Online is here to stay too and our use of it will grow rapidly in the coming years. It will be a big milestone when the first big product engineering team not associated with building VS Online/TFS moves over to VSO for all of their core engineering system needs – I’ll be sure to let you know when that happens.
I’ve talked about this before but something happened today that got me thinking about it again. One of the necessary components of the rapid release cycle we are on for VS and TFS is an easy and seamless upgrade process. All through the 2012 Update cycle (.1, .2, .3) we worked on making the TFS Upgrade process easier and more reliable – mostly focusing on the upgrade leaving your server configuration exactly how it was.
What caught my eye today was a mail from a customer titled: RE: [ TFS2013 ] Feedback - Upgrade to 2013.2 RC does not keep IIS settings required for Kerberos
It was a response to an email from February and unfortunately, it was too late, at that time to make a fix for Update 2. The body of the mail was:
“I just upgraded to TFS 2013.3 RC and the IIS settings were preserved, Thank you!”
I tend not to mention fixes/changes like this in my posts about releases because there are so many that the post would be crazy long. Sometimes I try to look for themes in the fixes and describe the intent/benefit of a set of work but I don’t always have time. We’re trying to get better about listing all the significant bug fixes in the KB article but I don’t know that we’re all the way there yet.
Regardless, every update, whether we describe all the fixes or not, generally includes dozens of little improvements like this that are just designed to make life better. Improvements that make it easier to digest more improvements are particularly valuable.
Today we are releasing Visual Studio 2013 and Team Foundation Server 2013 Update 3.
- Download Visual Studio 2013 Update 3 RC
- Visual Studio 2013 Update 3 KB article
- Visual Studio 2013 Update 3 RC release notes
You’ll find a complete list of download links in the release notes.
This is a “go-live” release and is expected to be the last preview before Update 3 is released. It will be supported in production, can be used to upgrade production environments and will support upgrades to the final version when it is available. We expect the final release to be within the next month or two.
There is quite a lot of stuff in VS 2013 Update 3. You can read about a lot of the detail on the Visual Studio blog or in the release notes I referenced above. I’ll call out a couple of the things that I’m particularly passionate about:
CodeLens support for Git – In Update 3, we’ve added CodeLens support for projects using Git, in addition to those using Team Foundation Version Control. CodeLens for Git works against the local Git repo and, as such, works whether you are using an onprem TFS server, VS Online, another Git service (like GitHub) or are offline completely. Next up on the CodeLens slate (not in Update 3 though), is support for CodeLens with TFVC on VS Online. Read more here.
Mixed Case Menus – I know I’m going to get some feedback on this one :) This is a long standing request by a vocal portion of the VS user base since VS 2012 to change the “ALL CAPS” menus. In VS 2013 Update 3, we have added a Tools –> Options setting to control whether you see ALL CAPS or Mixed Case. The default is still ALL CAPS but, if you change it, it will persist across upgrades and will roam across your IDE instances using the VS Online roaming settings feature (if you log into VS so it knows who you are). For grins, I’ve copied the DECLINED User Voice suggestion from 2 years ago. I guess we can “undecline” it now :)
In Team Foundation Server 2013 Update 3, we’ve also included some nice enhancements.
Test Plan and Suite customization – We’ve modified test plans and suites to be backed by work items so you can use the same customization techniques – adding fields, designing forms, etc. It also means you can query them like work items and you have versioning/history of them.
Release Management support for PowerShell DSC and Chef – The TFS release management solution now support writing deployment scripts in PowerShell DSC (Desired State Configuration) or in Chef. Among other things, this means we now have first class support for non-Windows platforms.
Some nice usability fixes – We’ve tried to slip in some nice customer driven TFS usability fixes into Update 3. A couple of the top ones are:
- We brought back the ability drill into group membership from within the web based permissions UI. This was a capability we had in TFS 2012 but lost in 2013 and have gotten a lot of feedback on it.
- We added a “filter setting” to the backlog to control whether “current sprint” user stories show up. They used to be included and we got a lot of feedback that people didn’t want them there so we took them away. Then we got a lot of feedback from people who do want them there. So now we’ve made it configurable.
Looking forward to your feedback. We’ve already started making progress on Update 4 so expect to hear more from me in the next month or so on the kinds of additional enhancements you can expect.
This week we are deploying our sprint 67 updates. We've started on a set of small changes to the Agile Project Management experience and you'll see the first of them in this sprint's release notes: http://www.visualstudio.com/en-us/news/2014-07-01-vso.
Of course, we continue moving forward in other areas as well but I hope these will improve your experience.
Busy week for me catching up from my week at the beach last week so I'm going to keep this post short and sweet :)
I sometimes use an analogy to a bag of sand. I use it to refer to treating something in the aggregate. Each grain of sand can be inspected for mineral content, density, porosity, size, color, etc. but usually we just talk about how much of it there is. In software people sometimes use a similar technique to deal with customers, resources (oh, I mean people ), schedules, etc. Treating things in the aggregate is sometimes necessary but never forget what you are losing in doing so.
Today I am excited. Maybe sometimes I’m excited by small things, but, none-the-less, I am excited. Some months ago, I wrote a post on measuring the quality of a service where I tried to articulate a customer focused view of quality. For the last couple of years I’ve had a “north star” of building a great way of measuring the quality of the experience our customers have on a customer by customer basis – rather than treating the customers as a “bag of sand” and only looking at the health of the overall service. We’ve been gradually making progress on that journey and today is a milestone.
Now for a brief aside before I continue…
9 months or so ago, we created something we call the “Top Customer” program for Visual Studio Online. We measure activity in the service and identify the top N customers as defined by # of active users, depth of usage, etc. We then offer this set of customers participation in the Top Customer program – some accept and some don’t respond. It’s all good. The benefits of the program for the customer include a direct contact in the product group to help with any issues they experience, early access to new features (think kind of like traditional Beta programs), etc. The benefit for us is that we get feedback on important decisions and early features before we roll them out broadly.
Now, back to the story on quality of service…
We recently (like within the last week) actually produced a quality of service report that actually lists quality of service by account. And then in the last couple of days, we composed that with our Top Customer program to see if any of them were experiencing issues. Sure enough it seemed so – about 4 out the the top 30 or so seemed to be getting experiences of lower quality than we expect to be able to deliver. So we contacted them to ask if they were seeing issues themselves and today, we got the first response back. The response we got back included this:
“I had put this down to general connectivity/service issues, but if you’re saying that you’re seeing this your side then yes something must be amiss. The reports I’ve had from devs are slow pull requests / more than normal commit fails and general authentication issues. Nothing so severe that we’re unable to operate, but is requiring an extra retry or request.”
As my 16 year old daughter would say BAM! We identified an issue a customer was having before they were frustrated enough to contact us to complain about it. I’m incredibly happy about that. I suspect we’ll need to continue doing this kind of cross checking for a little while but, ultimately, we don’t want to even bother the customer with it, of course – we just want to fix it for them proactively.
I also ultimately want to evolve our service health dashboard on http://www.visualstudio.com/support/support-overview-vs to provide a personalized view so you can see how your individual account is doing rather than the entirety of the service.
It’s a journey and we’re not nearly done yet but I’m really excited about this milestone.
There was a question on one of our internal aliases today about version to version improvements in our test capabilities. One of the program managers in that area wrote a response that really caught my eye. Primarily because it seems like an awesome demonstration of the effectiveness of our “Updates” model. It’s a pretty cool, constant stream of value. Below is an excerpt from his mail that includes links to information on each of the new capabilities that have been released.
VS 2012 (Microsoft Test Manager) - RTM:
- Compatibility of Microsoft Test Manager with Visual Studio 2010
- Manual Testing Windows Store Apps
- Enhanced Action Logs for Windows Store Apps
- Exploratory Testing window
- Manual Test Steps Can Include Multiple Lines
- Manual Tests Includes Rich Text
- Microsoft Test Manager Test Plan Results
- Cloning Test Suites into Other Plans for New Iterations
- MTM Performance improvements
VS 2012 VS Update 1:
- Pause and Resume in MTM
- Populate test suites using hierarchical queries in MTM
- Edit test cases during execution in MTM
- Automatic updates in MTM
- Other enhancements in MTM
VS 2012 Update 2:
- Test Plan/Test Suite Cloning in MTM
- Light-weight Browser based Test management & Execution
- Customization of test result fields and marking test results as NA in MTM/Web
VS 2013 RTM:
- View step attachments inline during execution, Add attachments and Pause/Resume tests in Web
- Inline editing for tests during execution in Web
- Test Plan Creation, Test Suite Mgmt, Parameter editing from Web
- Transition to MTM from Web
- Launching Test Runner from Web and pasting images into work item forms
- Bulk entry/edit of test cases using the grid view in Web
VS 2013 Update 2:
- Column options for the test case grid view in Web
- Exporting test artifacts in Web
- Shared Parameters for Test Cases in Web
VS 2013 Update 3 CTP2:
We’ve certified TFS 2013 (and its subsequent updates) to work with SQL 2014. However, there’s a hitch. Because TFS 2013 shipped before SQL, the license than grants the right to use it with SQL Server only allows SQL Server 2012. Starting July 1st, we will be adding SQL Server 2014 to the list of license grants for TFS 2013. That won’t, apply to earlier versions of TFS though – we haven’t even tested those with SQL 2014.
Anyway, hopefully this will make your life a little easier if you want to use SQL 2014. Of course, you can keep using SQL 2012 if you like.
It’s time for our sprint 66 deployment already! Over the next couple of days the update will be rolling out across accounts. The big news in this update is Pull Requests for Git repos. Pull requests are a workflow often used with Git whereby a developer makes some changes in a private branch. They then submit a “pull request” with is essentially a request for changes checked into that branch to be merged into another branch by the “owner” (or a committer in Git speak) of the target branch. That workflow enables a code review experience with back and forth discussions of the changes, refinements if the changes need updates, etc. Ultimately the pull request is either accepted and merged into the target branch or rejected.
With Team Foundation Version control we have a code review experience in Visual Studio. With Git, we chose to do an analogous experience in the web so that it’s available on all platforms and regardless of the IDE you use. We expect to enable the TFVC experience on the web too – and, likely also have an “optimized” VS integrated experience. Never an end of work to do :)
Because the pull request workflow is so fundamental to the Git collaboration experience, we’ve chosen to make it available to everyone with a VS Online Basic license or higher (that includes the 5 free licenses, the advanced license, the pro license and a qualified MSDN subscription). We expect we’ll ship this feature in on premises TFS in the future and will have to sort out that licensing when we do so.
You’ll find a new “Pull Requests” tab under “Code” for any project that has Git repos. If you hit New Pull Request, you’ll get an experience that looks like this to select the source and target branch, review the changes, etc.
If you hit “more options” in the blue area, it expands enable a description and a list of reviewers (adding your team by default).
A notification will be sent to your team room and a new pull request will show up in your list.
You can open the pull request and review it., comment on it, etc. You’ll also notice that VSOnline has already gone ahead and done a “test merge” to see if there will be any merge conflicts when I decide to merge it “for real”. In this case, there are some.
One more very nice step on our journey to bring Git up to parity with the capabilities we have in TFVC. As with pretty much everything we do in the cloud, it will continue to be a “work in progress” for several more sprints as we take feedback and refine it.
We hope you like it and, as always, value your feedback.
Today we released the second preview (CTP) of VS/TFS 2013 Update 3. You can download it here.
That I know of, there’s not much in the way of new features over and above what I listed in CTP1. The changes are mostly bugs fixed and features refined.
I’ll remind you that CTPs are not “go live” and are only provided for “tire kicking”. I believe our next CTP is going to be “go live”, I’ll let you know for sure when it releases.