Maybe we need to call it the collaboration net or some other fancy word instead of the cloud, but I think more often than not we're starting to miss some of the interesting and tractable use cases for cloud-enabled IT. From my view, this is largely because we're distracted with the potential enormous complexity surrounding orchestration layers and the virtualized data center. While there is significant innovation going on in that space, it is often just that - pure innovation if not speculation, and not necessarily real products or real world solutioning. Meanwhile, the end user staring at out of control, often unuseable data along with today's economy needs some new solutions for their day to day business.
Unfortunately, within the cloudy cloud, the real solutioning in my opinion is happening at the fringes, rather than at the core, because the core is often circling around what to do with a pretty complex stack made up of infrastructure, applications, orchestration, and more. But at the core, there are some interesting products that are changing the way IT is being done. One example that crossed my radar today is Zoho, and Zoho in my opinion is giving us storage solutioneers a bit to think about.
Now I don't have any particular connection with Zoho and I haven't even used their product myself, but Zoho is a web-based office suite, similar to Google Apps. Zoho recently came out with a SharePoint connector. Users of Zoho can now put their Zoho office data into SharePoint in the form of office files, and still open and use their data with Zoho applications. The interesting part of this story is how it all rotates around unstructured data, and managing and leveraging unstructured data in a collaborative manner across the web.
I have long thought that SharePoint is going to be the killer cloud application - with the virus-like spread of SharePoint across businesses, all it takes is a good tiered storage and collaboration connector for Microsoft to rule the cloud. Why? SharePoint is an easy to understand enhancer of unstructured data, and it is also rapidly becoming a master gateway into all the data a business thinks is important. I think there is plenty of room for other vendors to add to this ecosystem, and potentially steal the limelight before slow-moving Microsoft spoils the party. Zoho's stepping up, and hinting at how they think it should be done.
The point is, I think the Zoho stuff is 1.0 of what will be a tremendously useful and accessible cloud solution. Talk about enabling the Netbook, Zoho and SharePoint would be one way to do it. While the stuff on the back-end will be what it takes to power the cloud, this space is about delivering solutions that make business 2.0 more capable, and for the storage industry, those solutions more often than not will revolve around making unstructured data more collaborative. With SharePoint in mind, can you leverage your approach to unstructured data to make SharePoint more collaborative? Or could your unstructured data storage solution replace the capabilities of Microsoft SharePoint while providing more capabilities via the cloud? Footnote, Tarmin thinks they can, and there are a bunch of pretty cool components at play in their approach to unstructured data, the cloud, and collaboration.
Don't get me wrong, I like the speculation, conceptualization, and pioneering innovation around the cloud just as much as anybody else - it's a bit like conceptual lincoln logs for big boys and girls. But if I were doing it myself today, I wouldn't be investing my energy there. I'd be looking at how to get these real world, marketable solutions on the street. From my view, there seem to be hundreds or thousands of those solutions, but they are more often than not a marked change from the complex, big solutions that traditional vendors like to bring to market. It will be interesting to see how this change in the marketplace impacts the IT industry over the next decade.
Tuesday, June 23, 2009
Don't get lost in the stack - keeping an eye on the ball, despite the cloudiness
Labels:
cloud computing,
collaboration,
networked storage,
services
Friday, May 29, 2009
Virtualization Management Communities 2.0
It’s going to be a good year for virtualization administrators. They’ve spent the last few years consolidating servers, ramping up utilization, and slashing the time it takes to deliver new environments to users. Often, they’ve done it without much corporate support as both pioneers and evangelists. Much of their success has been due to the strong community they’ve built at places like the VMware Technology Network, where any number of peers is standing by to help resolve a complex problem or share a favorite script, just when you need it.
In the early years of the virtualization technology wave, there was a lot of talk about automation, and quite a few platforms were developed then quickly snapped up by the systems management heavyweights. Few of these platforms did much of anything out of the box – they were policy-based, but where were the policies? They were containers without content, and content always wins.
Over the past month, I’ve been excited to see the second generation of virtualization communities coming to life. Each of them is building on the VMTN model in its own way, leveraging the collective experience of the field to create intelligence organically. A few worth checking out:
vKernel’s SearchMyVM is a free download utility – delivered as a virtual appliance – that quickly indexes an entire VMware environment and fronts it with a Google-style search portal, complete with pre-built searches as well as a query builder. Think of it as a mini Splunk for virtual machines;
Vizioncore’s Virtualization EcoShell Initiative is a community portal – accessed through a freeware desktop app – for those who use Windows PowerShell to manage virtualization. Members can improve their PowerShell skills, or share and debug their own scripts;
TripWire’s vWire community leverages the vendor’s configuration management and security expertise, offering free Windows downloads to verify whether vMotion is working properly, for example, and whether your virtual machines pass VMware’s Security Hardening Guidelines.
The market will decide which communities will thrive, but I like the aggressive approach we’re seeing in this space: race to market with a free utility that solves a real problem, cuts a few key strokes, eliminates a manual job, or teaches something new. The challenge will be to invest enough energy and resources to keep the content coming and build strong ties between community members.
In the early years of the virtualization technology wave, there was a lot of talk about automation, and quite a few platforms were developed then quickly snapped up by the systems management heavyweights. Few of these platforms did much of anything out of the box – they were policy-based, but where were the policies? They were containers without content, and content always wins.
Over the past month, I’ve been excited to see the second generation of virtualization communities coming to life. Each of them is building on the VMTN model in its own way, leveraging the collective experience of the field to create intelligence organically. A few worth checking out:
vKernel’s SearchMyVM is a free download utility – delivered as a virtual appliance – that quickly indexes an entire VMware environment and fronts it with a Google-style search portal, complete with pre-built searches as well as a query builder. Think of it as a mini Splunk for virtual machines;
Vizioncore’s Virtualization EcoShell Initiative is a community portal – accessed through a freeware desktop app – for those who use Windows PowerShell to manage virtualization. Members can improve their PowerShell skills, or share and debug their own scripts;
TripWire’s vWire community leverages the vendor’s configuration management and security expertise, offering free Windows downloads to verify whether vMotion is working properly, for example, and whether your virtual machines pass VMware’s Security Hardening Guidelines.
The market will decide which communities will thrive, but I like the aggressive approach we’re seeing in this space: race to market with a free utility that solves a real problem, cuts a few key strokes, eliminates a manual job, or teaches something new. The challenge will be to invest enough energy and resources to keep the content coming and build strong ties between community members.
Labels:
Bartoletti,
communities,
management,
Virtualization
Wednesday, May 27, 2009
The changing data center landscape
Recently, Taneja Group published what has become an annual report reviewing the state of InfiniBand in mainstream IT. Once again, the landscape has evolved in interesting ways this year, with the virtual infrastructure and cloud computing being a couple of the forces that are driving InfiniBand adoption in the enterprise data center. Long story short, InfiniBand has proved itself a capable platform for continued evolution, and vendors with products in this space have long ago figured out how to make the fabric into a platform. While us bleeding edge technologists speculate about what infrastructure as a service is going to look like, the most common names in InfiniBand have long ago turned the infrastructure fabric into a service enabled platform. Slightly different twists, but you need a service enabled platform behind your infrastructure as a service, and with a service enabled platform you can turn your own infrastructure into a well managed service, with granular and comprehensive management. I'll illustrate this in more depth, but first a link.
The Taneja Group InfiniBand publication is available on the InfiniBand Trade Association website - www.infinibandta.org - and will be up for download via the Taneja Group website soon.
Now back to the story - don't stop at this report and think that you have all of the story. How is InfiniBand service enabled? It is a bigger picture than just the mechanics of InfiniBand switching and how data is transferred to host processes. InfiniBand has been engineered for extensibility and can in turn be a platform for innovation. Take for example Voltaire. I've recently been given a tour of the Voltaire Unified Fabric Manager (UFM) solution. UFM builds on the architecture of Voltaire switches in order to extend their capabilities with an even more intelligent management layer. That management layer can provide more intelligent routing in a layer above the fabric, while integrating with and leveraging core fabric routing and management technologies. More importantly, UFM can dive deep into the fabric to give real insight into total infrastructure activities and performance. So far, I haven't seen any other solutions claiming to be a "fabric manager" offer the sophisticated insight, resource management, performance trending, and core fabric function extension that UFM can. UFM is just one example, but it fully illustrates what a well architected fabric should be capable of. The fabric shouldn't be an invisible lower layer of connectivity, managed within a separate operational domain. The fabric should be integrated with all aspects of your infrastructure.
My take, all the hubub about new emerging fabrics is earning the ear of the enterprise customer. Whether those fabrics can deliver as a platform for extending enterprise computing capabilities will be judged in its own time. Meanwhile, these conversations are opening doors for new opportunities, and InfiniBand is poised to deliver, and with the door open, the differences between fabrics are starting to make themselves apparent. If the challenge is big enough to warrant a new approach, that's where we are finding users bringing InfiniBand into the mainstream enterprise.
The Taneja Group InfiniBand publication is available on the InfiniBand Trade Association website - www.infinibandta.org - and will be up for download via the Taneja Group website soon.
Now back to the story - don't stop at this report and think that you have all of the story. How is InfiniBand service enabled? It is a bigger picture than just the mechanics of InfiniBand switching and how data is transferred to host processes. InfiniBand has been engineered for extensibility and can in turn be a platform for innovation. Take for example Voltaire. I've recently been given a tour of the Voltaire Unified Fabric Manager (UFM) solution. UFM builds on the architecture of Voltaire switches in order to extend their capabilities with an even more intelligent management layer. That management layer can provide more intelligent routing in a layer above the fabric, while integrating with and leveraging core fabric routing and management technologies. More importantly, UFM can dive deep into the fabric to give real insight into total infrastructure activities and performance. So far, I haven't seen any other solutions claiming to be a "fabric manager" offer the sophisticated insight, resource management, performance trending, and core fabric function extension that UFM can. UFM is just one example, but it fully illustrates what a well architected fabric should be capable of. The fabric shouldn't be an invisible lower layer of connectivity, managed within a separate operational domain. The fabric should be integrated with all aspects of your infrastructure.
My take, all the hubub about new emerging fabrics is earning the ear of the enterprise customer. Whether those fabrics can deliver as a platform for extending enterprise computing capabilities will be judged in its own time. Meanwhile, these conversations are opening doors for new opportunities, and InfiniBand is poised to deliver, and with the door open, the differences between fabrics are starting to make themselves apparent. If the challenge is big enough to warrant a new approach, that's where we are finding users bringing InfiniBand into the mainstream enterprise.
Friday, May 15, 2009
Before Taking Off For the Cloud, Check Your Virtual Engines
In March, McKinsey published a discussion document, Clearing the Air on Cloud Computing, and I’ve had several colleagues and clients mention its controversial nature. In my view, the only finding that can be considered controversial is the claim that current cloud services offerings aren’t cost competitive for larger enterprises – in other words, large data center total cost of server ownership is actually less than most EC2 pricing options, for example. I can’t argue the numbers, but the cart might be getting in front of the horse.
The more salient topic covered, one I find decidedly non-controversial, is that TCO discussions are mostly premature for any but the smallest of IT shops. The first question should be: “How virtualized are you, and how’s that going?” If the enterprise has only seen modest gains in utilization, or is having trouble sharing servers among business units, or is running into tricky new performance problems with virtual servers, it doesn’t help much to let them know they do, in fact, already have an ‘internal cloud’. This type of retroactive rebranding is all the rage, but I’d encourage vendors to step back a bit.
There’s a large and growing demand in virtualized enterprises to leverage, optimize, and control the virtual estate. Virtualization's capital cost savings - consolidation and utilization - are by now well-proven. The operating cost savings? We’ve only scratched the surface. Until enterprise operations teams have greater confidence in the run-time performance of a fully virtualized environment, they won’t be ready for even a partial lift-out. Smart vendors will focus on the tricky contention and performance issues keeping virtualization teams up at night. And those are the vendors that will be trusted to help deploy into private and public clouds when the time comes.
The more salient topic covered, one I find decidedly non-controversial, is that TCO discussions are mostly premature for any but the smallest of IT shops. The first question should be: “How virtualized are you, and how’s that going?” If the enterprise has only seen modest gains in utilization, or is having trouble sharing servers among business units, or is running into tricky new performance problems with virtual servers, it doesn’t help much to let them know they do, in fact, already have an ‘internal cloud’. This type of retroactive rebranding is all the rage, but I’d encourage vendors to step back a bit.
There’s a large and growing demand in virtualized enterprises to leverage, optimize, and control the virtual estate. Virtualization's capital cost savings - consolidation and utilization - are by now well-proven. The operating cost savings? We’ve only scratched the surface. Until enterprise operations teams have greater confidence in the run-time performance of a fully virtualized environment, they won’t be ready for even a partial lift-out. Smart vendors will focus on the tricky contention and performance issues keeping virtualization teams up at night. And those are the vendors that will be trusted to help deploy into private and public clouds when the time comes.
Monday, April 6, 2009
More Thoughts on Storage Tiering
Historically, there have been two storage tiers: a primary tier on disk, and a secondary tier on tape. I've blogged before about how the requirements of storage tiering are changing, driven largely by economic considerations. Explosive data growth, combined with escalating retention and e-discovery requirements, are showing the weaknesses of a tape-based tier. These days, storage decisions are being driven largely by a flight to efficiency even as other considerations (performance, reliability, availability, etc.) still retain their importance.
Discussions with end users and vendors alike have given me a lot of food for thought. To the traditional three types of "functionally defined" tiers - primary, backup (defined to include DR), and archive - it may be valuable to consider a fourth "value driven" storage tier that can accommodate certain classes of both primary and secondary data. But what about the cost and complexity another tier may introduce? Well, let's run through this as a thought exercise first.
You really do have at least 4 kinds of data: the high performance primary tier, a lower performance primary tier, a backup tier, and an archive tier. Today, the high performance and lower performance primary tier data is all sitting in your highest performance array, taking up space for which you're paying the high performance premium. It's there because you don't want to move any data to your traditional other tier (tape) unless you know there's a very high probability that you won't need to access it. If you've implemented an "active archiving" tier that uses SATA disk as the storage medium (along with other technologies like scale out architecture, storage capacity optimization, "fancy" RAID 6+, replication, etc.) and migrated some of the data off your primary storage to it, does thinking about this as an "archive" tier limit its value? Forget about what we call it, let's think a little more about this SATA based tier...
End users point out that , while there is a need for high performance primary storage, a large percentage of the data they house there really doesn't need that level of performance, but it still has to be online and transparently accessible to them. Our data indicates this number is generally higher than 60%, and in some cases can approach 90%. Some vendors have catered to this by allowing end users to mix higher cost, higher performance FC disks and lower cost, lower performance SATA disks in the same arrays. Good idea, but not the most efficient approach, since you're incurring the cost of the high performance infrastructure in any array that might potentially house high performance disks (even if most of the disks in it are not).
On the other hand, the SATA based tier we've defined offers the performance, reliability, availability, and cost profiles that we want for both lower performance primary data and archive data. If on that platform you can define different namespaces that offer different features - some designed for lower performance primary data that do not require archive features (immutability, retention policies, data disposal, etc.) and some for data that does require archive features - then you may effectively implement this fourth tier. Because the SATA tier leverages a lower performance infrastructure that results in a lower overall average cost/GB, this is a more efficient place to put the lower performance primary data that comprises this fourth tier. Leave only the data that demands the highest performance on the high performance infrastructure, and move the rest to the lower performance SATA based tier that is housed in the lower cost infrastructure (i.e. the scale out secondary storage).
The point I'm making here is that if you are going to implement a SATA based secondary storage platform, you want to put as much of your data on it as you can while still meeting your performance requirements. Don't think about it as an "archive" tier, think about it as a "value" tier that can be used for some primary storage while at the same time supporting archive storage. Thinking about it in this way will help you to move as much data as possible off your high performance primary storage tier, not just your "archive" data. The more data you move off the high performance primary storage tier, the lower your overall $/GB gets.
Now back to the question on cost and complexity. There are scale out secondary storage vendors that let you define "value" and "archive" shares on the same physical platform. So you don't need another platform, you just define another tier (the value tier) and configure its functionality appropriately. It's agreed that it takes more work to define an additional tier even if it is in the same physical platform, but the payoff to moving more data off your primary storage is potentially large (and the additional work required quite small). And I think that there may be scale out secondary storage vendors that are underselling the value of what they may offer you (who'da thought that would ever happen?). Even if they don't figure it out, you can.
A quick word on the backup tier: minimizing your high performance primary storage tier minimizes time-consuming backup requirements (and hence backup infrastructure requirements). The value/archive tier still needs to be protected, but given the scale (hundreds of terabytes to petabytes for most companies over time) requirements, replication is the way to do this, not traditional backup. When you take the benefits of SATA technology and storage capacity optimization into account, you're looking at a cost profile for the relevant data of well under $1/GB not including a replicated platform, and slightly over it when you are. If ediscovery savings against your "archive" tier are used to cost justify this tier against a tape-based archive tier (which is feasible if you are regularly handling one or more lawsuits a year), you could effectively get to add the "value" tier for free. Not bad for a thought exercise...
Discussions with end users and vendors alike have given me a lot of food for thought. To the traditional three types of "functionally defined" tiers - primary, backup (defined to include DR), and archive - it may be valuable to consider a fourth "value driven" storage tier that can accommodate certain classes of both primary and secondary data. But what about the cost and complexity another tier may introduce? Well, let's run through this as a thought exercise first.
You really do have at least 4 kinds of data: the high performance primary tier, a lower performance primary tier, a backup tier, and an archive tier. Today, the high performance and lower performance primary tier data is all sitting in your highest performance array, taking up space for which you're paying the high performance premium. It's there because you don't want to move any data to your traditional other tier (tape) unless you know there's a very high probability that you won't need to access it. If you've implemented an "active archiving" tier that uses SATA disk as the storage medium (along with other technologies like scale out architecture, storage capacity optimization, "fancy" RAID 6+, replication, etc.) and migrated some of the data off your primary storage to it, does thinking about this as an "archive" tier limit its value? Forget about what we call it, let's think a little more about this SATA based tier...
End users point out that , while there is a need for high performance primary storage, a large percentage of the data they house there really doesn't need that level of performance, but it still has to be online and transparently accessible to them. Our data indicates this number is generally higher than 60%, and in some cases can approach 90%. Some vendors have catered to this by allowing end users to mix higher cost, higher performance FC disks and lower cost, lower performance SATA disks in the same arrays. Good idea, but not the most efficient approach, since you're incurring the cost of the high performance infrastructure in any array that might potentially house high performance disks (even if most of the disks in it are not).
On the other hand, the SATA based tier we've defined offers the performance, reliability, availability, and cost profiles that we want for both lower performance primary data and archive data. If on that platform you can define different namespaces that offer different features - some designed for lower performance primary data that do not require archive features (immutability, retention policies, data disposal, etc.) and some for data that does require archive features - then you may effectively implement this fourth tier. Because the SATA tier leverages a lower performance infrastructure that results in a lower overall average cost/GB, this is a more efficient place to put the lower performance primary data that comprises this fourth tier. Leave only the data that demands the highest performance on the high performance infrastructure, and move the rest to the lower performance SATA based tier that is housed in the lower cost infrastructure (i.e. the scale out secondary storage).
The point I'm making here is that if you are going to implement a SATA based secondary storage platform, you want to put as much of your data on it as you can while still meeting your performance requirements. Don't think about it as an "archive" tier, think about it as a "value" tier that can be used for some primary storage while at the same time supporting archive storage. Thinking about it in this way will help you to move as much data as possible off your high performance primary storage tier, not just your "archive" data. The more data you move off the high performance primary storage tier, the lower your overall $/GB gets.
Now back to the question on cost and complexity. There are scale out secondary storage vendors that let you define "value" and "archive" shares on the same physical platform. So you don't need another platform, you just define another tier (the value tier) and configure its functionality appropriately. It's agreed that it takes more work to define an additional tier even if it is in the same physical platform, but the payoff to moving more data off your primary storage is potentially large (and the additional work required quite small). And I think that there may be scale out secondary storage vendors that are underselling the value of what they may offer you (who'da thought that would ever happen?). Even if they don't figure it out, you can.
A quick word on the backup tier: minimizing your high performance primary storage tier minimizes time-consuming backup requirements (and hence backup infrastructure requirements). The value/archive tier still needs to be protected, but given the scale (hundreds of terabytes to petabytes for most companies over time) requirements, replication is the way to do this, not traditional backup. When you take the benefits of SATA technology and storage capacity optimization into account, you're looking at a cost profile for the relevant data of well under $1/GB not including a replicated platform, and slightly over it when you are. If ediscovery savings against your "archive" tier are used to cost justify this tier against a tape-based archive tier (which is feasible if you are regularly handling one or more lawsuits a year), you could effectively get to add the "value" tier for free. Not bad for a thought exercise...
Saturday, March 14, 2009
Cisco's Virtual Awakening
On Monday the 16th, Cisco is expected to announce its entry into the server market. This competitive assault by the king of networking is a game-changer and will shatter the comfortable territorial boundaries we've fortified in the IT market over the last twenty years. The headlines will read, "Network Giant Makes Aggressive Leap Into Server Territory," which is true. However, I think a more accurate headline would read, "Virtualization Forces Cisco To Redefine What A 'Network' Vendor Is."
The key driver for this bold move, in my view, is not to grab a larger portion of device marketshare, but to grab a significant chunk of IT buyer mindshare. Virtualization hasn't only shredded traditional IT architecture and deployment processes, it has also upset the balance of power in the data center. We're all comfortable with the silos of server, storage and network control and expertise, and vendors have relied on those silos to nurture and protect relationships. The rise of the virtual environment and the "virtualization administrator" changes the game.
Virtualization, at its core, is about abstraction and mobility. By redefining the links between applications and all types of devices they require, workloads are freed from server, array, or switch constraints. This freedom in turn drives the need for new IT management tools that operate at the virtual infrastructure level and ease the burden of juggling alerts and resource contention along three dimensions at once.
Cisco is doing more than adding servers to its product line Monday; it is repositioning itself as a virtual infrastructure management vendor. In my experience, the best way to compete in the management arena against well-established incumbents is to focus on the gaps. Cisco should deliver targeted point solutions quickly that solve the most critical server-network management challenges faced by virtualization-savvy customers, to accelerate Unified Computing from concept to reality.
The key driver for this bold move, in my view, is not to grab a larger portion of device marketshare, but to grab a significant chunk of IT buyer mindshare. Virtualization hasn't only shredded traditional IT architecture and deployment processes, it has also upset the balance of power in the data center. We're all comfortable with the silos of server, storage and network control and expertise, and vendors have relied on those silos to nurture and protect relationships. The rise of the virtual environment and the "virtualization administrator" changes the game.
Virtualization, at its core, is about abstraction and mobility. By redefining the links between applications and all types of devices they require, workloads are freed from server, array, or switch constraints. This freedom in turn drives the need for new IT management tools that operate at the virtual infrastructure level and ease the burden of juggling alerts and resource contention along three dimensions at once.
Cisco is doing more than adding servers to its product line Monday; it is repositioning itself as a virtual infrastructure management vendor. In my experience, the best way to compete in the management arena against well-established incumbents is to focus on the gaps. Cisco should deliver targeted point solutions quickly that solve the most critical server-network management challenges faced by virtualization-savvy customers, to accelerate Unified Computing from concept to reality.
Friday, February 27, 2009
VMWorld Highlight: Virtual Infrastructure Optimization
The Solutions Exchange at VMWorld Europe confirms that many vendors are tapping into a key customer concern for 2009: the optimization of existing, growing virtual server estates and the integration of storage performance data into the administrator's dashboard. While many customers I've spoken to have made progress toward managing virtual machine sprawl, they struggle to identify, correlate, and diagnose performance problems for I/O-intensive production applications, problems that often span server to storage.
From the application through the I/O adapters and switches, to the arrays themselves, there's a lack of visibility into the I/O path for root cause analysis -- in real-time and at production scale. This is in addition to the problem of right-sizing: resource optimization at provisioning/set-up times. Both activities require deeper insight into the impact of virtualization on storage performance.
Virtual Instruments' VirtualWisdom, announced here this week, aims to provide this end-to-end runtime visibility and cut through the finger-pointing between server and storage vendors when performance issues arise in production. Their solution is worth a look, to augment in-place or other vendor solutions (there are plenty of excellent ones here to explore) that may provide insight only during the initial capacity planning and provisioning phases. It's clear that even the best-planned virtual environment often behaves differently at production scale; VirtualWisdom can tell you why.
Labels:
Bartoletti,
virtual infrastructures,
Virtualization
Wednesday, February 25, 2009
Backup and Archive: Two Different Animals
Are you using older backups as your archives? Are your archives sitting on tape? For years, this has been the norm because on the surface this approach looks cheap and easy. But like some other things that are cheap and easy, you may be in for a few unwanted surprises if you continue in your errant ways.
Historically, backup and archiving were mostly about making and retaining a copy of the production data. Many backup products in the 90s seemed to be all about backup, regardless of what that did to the recovery process. Archives, if people even had them, were about keeping data around as cheaply as possible, mostly to meet regulatory requirements (which meant that it was mostly done in certain industries with compliance requirements). Both used tape, so it was natural for a "backup" to become an "archive" and get shipped to some remote site after some period of time. But things have changed considerably:
1) With increasingly stringent requirements for RPO and RTO, the focus of data protection has clearly shifted to recovery
2) Operations have moved to a 7x24 clock, driving concerns about the implications of backup on production application environments
3) The focus of archiving has expanded to include accessibility, primarily to meet the demands of an increasingly litigious corporate environment
Pushing the envelope on tape technologies to try to address the first two items above led to another unintended consequence: people became very aware of the recovery reliability issues with tape media when used to meet backup requirements. Tape is a sequential access media, but backups and restores basically needed a random access media. Tape is also primarily an offline medium, a fact which meant it did not lend itself well to the types of discovery operations that had to be performed against archives to find responsive materials to deal with lawsuits. A study we did last year indicated that discovery operations against tape cost 10x as much as those same operations if they were performed against disk where computerized search could be leveraged. With the average cost of a lawsuit being in the range of half a million dollars for large enterprises, e-discovery could save hundreds of thousands of dollars if at least several lawsuits were being handled per year. Plus, imagine the judge's reaction when you can't produce some responsive materials that you clearly should be able to due to media reliability issues. Disk was the obvious answer, if its cost could be brought down significantly.
Today, backup is about recovery, archiving is about cost effective retention and searchability. The two business objectives drive different requirements, but there is a single medium which is well matched with their foundation requirements: disk. Different software functionality is required for each, but this raises the question again of whether your backups should just age into becoming your archives.
We recommend that backup and archive be managed separately. First, since most restore requests come from the most recent backups, the "backup" problem has more of a short term focus to it. Disaster recovery has less of a short term focus, mostly because of operational limitations about how to get that data to a remote site but also because of the requirement that it support multiple comprehensive recovery points. Archiving clearly has a long term focus but should NOT just be a process which occurs at the end of the backup data life cycle. To optimize your existing storage infrastructure for performance, cost, and protection, data should be archived well before it is no longer needed for backup and/or DR purposes. This drives very positive implications for managing primary storage and the costs associated with it (see my blog from February 24, 2009).
In dealing with end users on this issue, two conclusions are evident:
* Backups and archives should be managed separately, and you should seriously consider using disk-based options for both if you're not already
* Archiving to tape is NOT cost effective from an overall TCO point of view if you're dealing with multiple concurrent lawsuits on a regular basis
Historically, backup and archiving were mostly about making and retaining a copy of the production data. Many backup products in the 90s seemed to be all about backup, regardless of what that did to the recovery process. Archives, if people even had them, were about keeping data around as cheaply as possible, mostly to meet regulatory requirements (which meant that it was mostly done in certain industries with compliance requirements). Both used tape, so it was natural for a "backup" to become an "archive" and get shipped to some remote site after some period of time. But things have changed considerably:
1) With increasingly stringent requirements for RPO and RTO, the focus of data protection has clearly shifted to recovery
2) Operations have moved to a 7x24 clock, driving concerns about the implications of backup on production application environments
3) The focus of archiving has expanded to include accessibility, primarily to meet the demands of an increasingly litigious corporate environment
Pushing the envelope on tape technologies to try to address the first two items above led to another unintended consequence: people became very aware of the recovery reliability issues with tape media when used to meet backup requirements. Tape is a sequential access media, but backups and restores basically needed a random access media. Tape is also primarily an offline medium, a fact which meant it did not lend itself well to the types of discovery operations that had to be performed against archives to find responsive materials to deal with lawsuits. A study we did last year indicated that discovery operations against tape cost 10x as much as those same operations if they were performed against disk where computerized search could be leveraged. With the average cost of a lawsuit being in the range of half a million dollars for large enterprises, e-discovery could save hundreds of thousands of dollars if at least several lawsuits were being handled per year. Plus, imagine the judge's reaction when you can't produce some responsive materials that you clearly should be able to due to media reliability issues. Disk was the obvious answer, if its cost could be brought down significantly.
Today, backup is about recovery, archiving is about cost effective retention and searchability. The two business objectives drive different requirements, but there is a single medium which is well matched with their foundation requirements: disk. Different software functionality is required for each, but this raises the question again of whether your backups should just age into becoming your archives.
We recommend that backup and archive be managed separately. First, since most restore requests come from the most recent backups, the "backup" problem has more of a short term focus to it. Disaster recovery has less of a short term focus, mostly because of operational limitations about how to get that data to a remote site but also because of the requirement that it support multiple comprehensive recovery points. Archiving clearly has a long term focus but should NOT just be a process which occurs at the end of the backup data life cycle. To optimize your existing storage infrastructure for performance, cost, and protection, data should be archived well before it is no longer needed for backup and/or DR purposes. This drives very positive implications for managing primary storage and the costs associated with it (see my blog from February 24, 2009).
In dealing with end users on this issue, two conclusions are evident:
* Backups and archives should be managed separately, and you should seriously consider using disk-based options for both if you're not already
* Archiving to tape is NOT cost effective from an overall TCO point of view if you're dealing with multiple concurrent lawsuits on a regular basis
Labels:
active archive,
archiving,
backup,
Burgener,
disk based backup,
tape archive
Tuesday, February 24, 2009
Pulling One Out Of The Hat
Inertia is a fact of corporate life. If you're responsible for figuring out what you're going to spend your IT budget on in the next 12 to 18 months, it's likely that a very high percentage of your spend will be on projects that you were spending money on last year. When trying to find places to cut, many people think first about new projects and initiatives. But another great place to look are budgets that get a large relative percentage of your overall spend. And if you're like most enterprises in this era of exploding data growth and increasing retention requirements, your primary storage budget probably fits that definition.
If the corporate stars have aligned and everyone is already in agreement that you're going to spend "x" on primary storage this year, here's something to consider. The business objective is to meet the enterprise's requirements for primary storage capacity, along with the infrastructure requirements that go along with that (data protection, DR, security, etc.). As long as you meet that requirement within the budget, you may have some flexibility in whether you spend that on technology that meets the strict definition of "primary storage" as long as you meet the business requirement. It's a fact that for most enterprises, at least 70% of the data sitting in primary storage infrastructure today is rarely if ever accessed. But it's there on the off-chance that you might need it, and it hasn't gotten to the point that you're so sure you'll never need it that you've migrated it to some sort of tape archive. All that data is driving a lot of management overhead for you for performance optimization, redundancy, backup, etc.
More and more vendors have figured this out, and are offering what we at Taneja Group call "active archival storage". Basically, these are very scalable, disk-based platforms, generally accessible through industry standard interfaces such as NFS and CIFS, that leverage technologies like SATA and storage capacity optimization (file level single instancing, data de-duplication, compression, etc.) to store lots of data very cheaply. Permabit might have been one of the first to enter this game, but they've been joined by other vendors, small and large, and you can now deploy this capability either as a product or as a service (Iron Mountain recently announced a cloud-based archiving solution). Here's the thinking: you've already decided you need x TB of new primary storage this year (fill in your requirement) and that's going to cost y dollars (fill in your cost). Instead of buying more primary storage, take that y dollars and buy an active archiving platform (or start up one of the cloud-based services) and move the 70% or more of your stale "primary" data into it. That has several impacts:
1) now you need a lot less primary storage (and probably won't need to buy any more this year after you've freed up 70% of your existing primary storage capacity)
2) now you're backing up a lot less primary storage, so backups take a lot less time
3) now you're spending a lot less to store your data, since the new average $/GB is a blend between the $20/GB or more you're paying for primary and the $1/GB or less you'll be paying for this active archiving platform (think how much more active archive storage that $20/GB will buy)
4) the data is still online so end users can transparently access it, and because it's online it's now searchable for e-discovery purposes, a fact which we've seen save hundreds of thousands of dollars (relative to tape-based discovery) in just one year for large enterprises that deal with multiple lawsuits concurrently (which unfortunately is most of the Fortune 2000)
It's not all peaches and cream, though, since you'll have to manage another platform, which may or may not mean introducing a new vendor into your shop. But you can do this without asking for any additional budget and you'll be easing the backup burden while at the same time decreasing e-discovery costs in a big way, not to mention making it faster and easier. Data migration doesn't need to be done up front, you can just let the platform manage that over time according to policies you establish. If you're going to buy one of these platforms, though, you'll need sufficient scale, say around 80-100TB of primary storage with your data growing at a good clip, to cost justify it using the above example. Smaller companies may consider cloud-based offerings which will let you in for under 1TB.
Long term, most medium to large enterprises will be using an online secondary storage tier. Tape just can't meet evolving archive requirements, especially where e-discovery is a concern. With the economy the way it is these days, this is something to at least think about this year.
If the corporate stars have aligned and everyone is already in agreement that you're going to spend "x" on primary storage this year, here's something to consider. The business objective is to meet the enterprise's requirements for primary storage capacity, along with the infrastructure requirements that go along with that (data protection, DR, security, etc.). As long as you meet that requirement within the budget, you may have some flexibility in whether you spend that on technology that meets the strict definition of "primary storage" as long as you meet the business requirement. It's a fact that for most enterprises, at least 70% of the data sitting in primary storage infrastructure today is rarely if ever accessed. But it's there on the off-chance that you might need it, and it hasn't gotten to the point that you're so sure you'll never need it that you've migrated it to some sort of tape archive. All that data is driving a lot of management overhead for you for performance optimization, redundancy, backup, etc.
More and more vendors have figured this out, and are offering what we at Taneja Group call "active archival storage". Basically, these are very scalable, disk-based platforms, generally accessible through industry standard interfaces such as NFS and CIFS, that leverage technologies like SATA and storage capacity optimization (file level single instancing, data de-duplication, compression, etc.) to store lots of data very cheaply. Permabit might have been one of the first to enter this game, but they've been joined by other vendors, small and large, and you can now deploy this capability either as a product or as a service (Iron Mountain recently announced a cloud-based archiving solution). Here's the thinking: you've already decided you need x TB of new primary storage this year (fill in your requirement) and that's going to cost y dollars (fill in your cost). Instead of buying more primary storage, take that y dollars and buy an active archiving platform (or start up one of the cloud-based services) and move the 70% or more of your stale "primary" data into it. That has several impacts:
1) now you need a lot less primary storage (and probably won't need to buy any more this year after you've freed up 70% of your existing primary storage capacity)
2) now you're backing up a lot less primary storage, so backups take a lot less time
3) now you're spending a lot less to store your data, since the new average $/GB is a blend between the $20/GB or more you're paying for primary and the $1/GB or less you'll be paying for this active archiving platform (think how much more active archive storage that $20/GB will buy)
4) the data is still online so end users can transparently access it, and because it's online it's now searchable for e-discovery purposes, a fact which we've seen save hundreds of thousands of dollars (relative to tape-based discovery) in just one year for large enterprises that deal with multiple lawsuits concurrently (which unfortunately is most of the Fortune 2000)
It's not all peaches and cream, though, since you'll have to manage another platform, which may or may not mean introducing a new vendor into your shop. But you can do this without asking for any additional budget and you'll be easing the backup burden while at the same time decreasing e-discovery costs in a big way, not to mention making it faster and easier. Data migration doesn't need to be done up front, you can just let the platform manage that over time according to policies you establish. If you're going to buy one of these platforms, though, you'll need sufficient scale, say around 80-100TB of primary storage with your data growing at a good clip, to cost justify it using the above example. Smaller companies may consider cloud-based offerings which will let you in for under 1TB.
Long term, most medium to large enterprises will be using an online secondary storage tier. Tape just can't meet evolving archive requirements, especially where e-discovery is a concern. With the economy the way it is these days, this is something to at least think about this year.
VMworld Europe 2009: Keynote Live Blogging
VMware CEO Paul Maritz kicked off VMworld Europe today and fired a shot directly across Citrix's bow. You may recall Citrix's Jan 21 Announcement of a bare-metal desktop hypervisor to be developed in partnership with Intel. Now, just a little over a month later, VMware signals it's not ready to give up the desktop by making virtually the same announcement.
VMware's Client Virtualization Technology, leveraging Intel's vPro, is a direct match for Citrix's Project Independence. Maritz was clear that the desktop is key to VMware's 2009 strategy, adding that the View (VDI) suite announced in December will be fully rolled out by the end of 2009.
Also, Maritz premiered VMware vSphere, a blanket rebranding of the VI suite which seems to include (replace?) the Virtual Data Center Operating System (VDC-OS) branding of 2008. At first glance, the new brand aims to break down any distinction between internal, external and "private" clouds: they are all one extended virtual fabric infrastructure. This led to a sweeping vision of VMware's vCenter management strategy, but I'll save that for another post.
Labels:
Bartoletti,
virtual infrastructures,
Virtualization,
vmworld
Subscribe to:
Posts (Atom)