Developing a Data-storage Strategy
James WhitfillCloud-based storage requires evaluation, just as any other storage strategy does, according to James T. Whitfill, MD, CIO of Scottsdale Medical Imaging Ltd (SMIL) in Arizona. “You have to understand how the cloud provider replicates data to different locations, so you trust that there really are multiple copies of those data; that they are kept secure, from a physical perspective; that there are environmental controls, and that your data are not being accessed,” he observes. “Under HIPAA, you must protect that access.” He asks, “How does that cloud provider guarantee that when you delete those data, if you want to, they are really gone? You think that those data have been destroyed, but can that cloud provider guarantee that they were destroyed?” So far, Whitfill says, SMIL has not turned to commercial cloud storage, although it remains an option. It can be a very strong option for smaller radiology practices, Whitfill says, and even large practices like SMIL are keeping an eye on the market and the latest storage devices, looking for opportunities to shave expenses and maintain reliability. SMIL installed its PACS in 2003, and its built-in archive has only a year or two left before it reaches capacity; after that, the practice will have to make more decisions about what images and data it needs to store, for how long, and using what infrastructure. SMIL is a 44-person radiology practice that reads for three Scottsdale Healthcare hospitals and operates 14 imaging centers, including newly opened facilities in Phoenix and Gilbert, Whitfill says. The radiologists read about 300,000 outpatient exams in SMIL’s own network per year; another 350,000 studies are read at the Scottsdale Healthcare hospitals, but SMIL plays no part in the storage of hospital images or data, Whitfill says. “The hospital PACS and the SMIL PACS are completely separate systems, with different storage archives and different storage databases,” he notes. “The integration is at the network level, where if a radiologist is on an SMIL PACS, he or she can open the hospital PACS on the Web browser, or at a hospital PACS station in an SMIL reading room. In the hospital reading rooms, we have access to the SMIL PACS. If a patient is going to the hospital, and we know in advance, we can export from our PACS to the hospital PACS, and vice versa.” Whitfill continues, “We don’t have a unified worklist across the two sites today. The solution we have today is not perfect. The added overhead cost and management around a single worklist have stopped us, for now. As CIO of SMIL, my liability is the SMIL PACS. I have no control over the hospital PACS. We do work cooperatively to exchange best practices.” That’s possible because both entities have similar versions of PACS, from the same vendor, he adds. SMIL’s Storage Needs SMIL’s image archive is divided into short-term and long-term components, Whitfill says, even though all images remain immediately available online, for now. The long-term archive has about 40 terabytes of capacity. It acts as a backup for the short-term archive, which has 30 terabytes of capacity. Off-site storage provides an additional 40 terabytes of capacity for replicated long-term data, Whitfill adds, providing an additional layer of redundancy. “We have over 100 terabytes of storage dedicated to imaging, and another 10 terabytes for database servers, billing, and those sorts of things,” Whitfill says. The imaging archive contains all the CR, MRI, CT, mammography, nuclear-medicine, ultrasound, PET, and dual-energy x-ray absorptiometry studies done at the SMIL imaging centers, Whitfill says. The only thing missing is interventional radiology, since such procedures are only done at the hospitals. Both the long-term and short-term archives kept in-house use RAID network-attached storage, Whitfill says. He adds that this technology significantly speeds retrieval of images. Using network-attached storage was a step that SMIL took after the price of RAID spinning-disk storage decreased. Initially, the practice used DVD jukeboxes and digital tape for data. Whitfill says, “At that point, years ago, they were much more economical than hard-disk storage, but now, the hard disk is inexpensive enough that we can use it for all the storage.” RAID storage has another big advantage, Whitfill adds. The RAID boxes can be configured so that even if one or two disks fail, the entire pool of imaging data can be retrieved from the remaining disks. If, however, a pipe leaks water on a server or there’s a mishap and the whole server is lost, that means trouble, Whitfill says. That’s why SMIL has a second RAID server as its long-term archive. What if a disaster destroys both RAID boxes? To meet that contingency, SMIL has its off-site archive. The off-site archive is more than 10 miles from the SMIL archiving site, and it uses a different storage medium. This is deliberate: If there is a flaw in the spinning-disk archives, SMIL wants its off-site storage to be unaffected. For off-site storage, SMIL has contracted with a commercial vendor to store its data on ultradensity optical-disc cartridges. Coming Changes Whitfill says that SMIL’s storage strategy has been to emphasize low cost, rapid retrieval, and redundancy, all of which are provided by its current system. That system, however, won’t meet the practice’s storage needs much longer. “Looking at our current environment, we have another 11 terabytes of short-term storage, and that’s at least two years,” he says. “Once we fill that, we may not use the short-term storage for all prior imaging any more. We may decide that having two permanent long-term copies is good enough. The short-term archive would then have only the most recent studies—maybe from the most recent eight years.” Whitfill says that it’s easy and cheap to add storage capacity now just by adding another network-attached storage server. At some point, though, space and cost become concerns. “Across the field, issues of power, heat, and space are becoming bigger and bigger concerns,” he says. Each added server represents an increased need for electric power. Another driver of storage demand is the sheer size of imaging studies these days (particularly for CT exams). Whitfill says, “Eight years ago, a CT study was 50 to 200 slices; now, it’s 1,000, 2,000, or even 3,000 slices.” Inevitably, SMIL will have to rethink its storage strategy to make room for more and more images. This could mean that it will turn to commercial cloud-storage providers, or it could mean that it will use newer storage media such as solid-state drives—which provide even faster retrieval than RAID storage can, but currently are two or three times more expensive, Whitfill says. Compression and Purging At some point, SMIL will probably have to consider file compression or the ultimate step of purging files altogether. This will have to be done with legal requirements for image and data retention in mind, Whitfill notes. The legalities implicit in storage are another reason that every radiology practice needs a storage strategy. “Each locality has different legal requirements,” Whitfill says. “We had to store film for seven years, with pediatric cases held until the patient was older than 18 and mammograms held for the life of the patient.” Many PACS archives haven’t much exceeded the seven-year legal storage minimum from the film days, so not much attention has been given to purging digital files, Whitfill adds. It’s been too easy just to add storage, but that’s changing. “PACS has been around long enough that purging is a purchasable reality now,” Whitfill says. “It’s something you can implement in a PACS today. The vendors have designed ways to purge.” Purged data can’t be retrieved; absolute erasure is such a big step to take that many major PACS vendors have made it impossible to do by mistake, but it can be done deliberately. When SMIL’s short-term PACS archive is full, “We will have two decisions to make in two years,” Whitfill says. “Do we still keep a short-term copy, another long-term copy, and another off-site copy, or are long-term and off-site copies enough? Do we start to purge exams done more than seven years ago? For now, the storage space for those we did more than seven years ago is so small that we’re not sure it makes sense to purge.” An alternative to purging is compression, some of which SMIL already does. There are two basic types of compression; with lossy compression, most data are purged from files, but enough data are kept so that “you still have something as a compromise,” Whitfill says. Lossy compression might reduce a 100-megabyte file down to 5 megabytes, Whitfill says; obviously, some image detail will be removed. SMIL doesn’t use lossy compression. It uses lossless compression, in which file sizes might be reduced by half or (at most) two-thirds, but detail is retained, Whitfill says. For mammograms, he notes, the study, by law, must be interpreted in the same format in which it is stored, and lossy compression cannot be used. For other modalities, “If you store lossy, you have to read lossy,” he says. Migration and Text Files Another concern, in storing image data, is what happens when you change storage media or have to replace worn-out hard disks. At these times, the data must be copied from the original file to the new one. “We have done some migrations,” Whitfill says. “With our long-term archive, we were originally storing on digital tape, and when we put in DVD, we had to migrate all the data. Then, when we went to network-attached storage, we had to move all the data from DVD to the spinning disks. We haven’t done a migration like that in four years, but at some point, the spinning-disk servers will reach the end of their lives, and we’ll have to migrate again. We believe migrating from one spinning disk to another will be a lot easier than migrating from tape to disk. That took almost a year and a half to do, running every day.” If all these storage concerns aren’t enough for the radiology-practice IT department, there is yet another: how to handle the storage of text information—all the patient demographics, billing entries, and radiology reports. Whitfill refers to this text material as the database, and storing it is different from storing image sets. Imaging exams can be archived automatically because once they are created, the exams never change—but this is not the case for the information in the database. As bills are paid or changes take place in a patient’s status, the database changes. Every day, a new version of the database—with its own referencing to the images, via DICOM headers—is created, Whitfill says. How does a practice go about storing the material in the database? Does it store every version, day after day, or is it enough to store only the most recent versions in-house, with perhaps year-end versions placed off-site? Whitfill says that the database archive at SMIL has a 10-terabyte capacity, but of that, only about 150 gigabytes have been used. The 10-terabyte archive also uses RAID technology, but it’s “a separate enclosure, with a separate strategy” from the imaging archives, he adds. For off-site backup, the database is stored on both disks and tape. Tape versions are slower to access in emergencies, but they have the advantage of being more portable. They can be placed in a safe-deposit box at the end of the year, Whitfill says. “What’s more important? Is it more important to store the database 50 miles away? That’s secure, but it takes a while to get to, so should it be two miles away, on spinning disks? We try to make those judgments, always refining our strategy,” Whitfill says. “Three years ago, we would have backed up every server to tape and moved it off-site once a week, but now, we feel it’s more advantageous to back up some components to disk and to have the last 20 versions on-site,” but SMIL no longer takes the extra step of moving each daily or weekly version of the database to off-site storage. Data can be physically lost or stolen when tapes or other removable media are moved off-site. Whitfill says, “We’ve already got multiple backups, and it’s overkill to put it off-site. We have more than enough layers of redundancy and less management overhead.” If there is one lesson to be learned about archiving, it’s that the landscape is always changing. As imaging and database archives grow, the radiology-practice CIO will always be refining his or her storage strategy, with certain overarching goals in mind: “Our storage strategy is based on the idea that we want a rapid-access, cost-effective, and redundant approach,” Whitfill says. George Wiley is a contributing writer for