From IEEE Spectrum:
Yes, the print and electronic copies of the same book contain the same words, but it’s obvious to most people (and, increasingly, to researchers) that the two reading experiences are quite different.
We need to understand such differences because the world is going to see a lot more digital data in the near future. This includes born-digital [pdf] data, which is originally created in an electronic format, as well as born-analog data, which starts life as a physical object and then is reborn digital. A great example of this digitization came earlier this year when the New York Public Library announced that it was making more than 180,000 digitized items available to anyone with an Internet connection, no questions asked.
That librarians would turn themselves into digital curators is no surprise, since as analog curators for the past few centuries they have been constantly bumping into the physical constraints of storage space and material decay. One approach is to get rid of stuff, and librarians and archivists employ a pleasing variety of terms related to the removal of unwanted or duplicate material from their collections: Weeding and culling generally refer to the removal of individual items, while purging, screening, and stripping are most often used for the removal of multiple related items. But the main problem with physical materials is that they possess what archivists call, poetically, inherent vice: the tendency for something to deteriorate over time because of some fault in the material itself (for example, the presence of lignin in cheap paper, which causes the paper to yellow) or the way the material reacts with its surroundings (for instance, the fact that bugs eat some books because they’re attracted to the mold that grows in damp paper).
. . . .
Having digitized some data, the archivist now faces a new problem: the eventual obsolescence of the data structures or media used to store the data, necessitating a format migration (or a media migration) to something newer. Copying the data without changing the format or media type is called refreshing.
Link to the rest at IEEE Spectrum