Archivismi: Cassandra attraverso i secoli

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord

Archivismi: Cassandra through the ages

Warning: This post was created 5 months does

This is a text automatically translated from Italian. If you appreciate our work and if you like reading it in your language, consider a donation to allow us to continue doing it and improving it.

The articles of Cassandra Crossing I'm under license CC BY-SA 4.0 | Cassandra Crossing is a column created by Marco Calamari with the "nom de plume" of Cassandra, born in 2005.

We told you it wouldn't end like this… Cassandra wants to go further!

This article was written on January 10, 2024 from Cassandra

Cassandra Crossing 568/ Archivismi: Cassandra through the ages

Cassandra is not satisfied, she wants to go further and wants to survive not for decades but for centuries or millennia. Can he do it?

In previous 10 episodes of Archivists we described the first archiving campaign; that of the Cassandra Crossing column on IInternet Archive. It was a long journey, since we started from the study of the structure of the Internet Archive, followed the preparation of the data, created a few dozen lines of script to automate everything, performed the actual uploads, and finally the cleaning of the data and correcting errors in metadata.

Today we will introduce the third Cassandra Crossing archiving campaign.

Ouch — some of the more informed 24 readers will say — the third campaign? But where did you tell us the second one?

Very right, I didn't tell the second one because it was too easy and quick.

The second campaign consisted of archiving the 106 videos of A chat with Cassandra on Internet Archive. Cassandra decided not to talk about it because it just finished, and it only took 20 minutes of bulk upload spreadsheet preparation and about an hour of uploading. It is true that we had gained valuable previous experience, that the metadata inserted were elementary and that the starting data were already well structured, but something so simple and quick could not deserve even a brief statement from Cassandra. So I'm throwing it there, go and see the result, and let's actually move on to the third archiving campaign, which I can tell you, will be much more exciting.

However, as Cassandra has now accustomed you, we must tell a bit of history. Actually much more than a little, given that it is not a question of starting from the dawn of the Internet, not even from the dawn of computers, but even from the dawn of writing, which means rewinding the tape, so wholesale, of 5 abundant millennia. It is from that remote era that the first archive of homogeneous information has come down to us, written in cuneiform characters on approximately 4,000 clay tablets. If we consider the clay tablet as an information medium, we could say that the Uruk tablets proved to be very durable, making all modern information media pale in comparison.

It is true that countless other clay tablets have not survived the long journey to us, like their more famous 4000 colleagues, but the effectiveness of the support remains remarkable.

Parchment scrolls proved to be little less durable; in fact, the oldest are just over two thousand years old, and the "average" lifespan of the parchment, preserved in ideal conditions, is estimated at around a thousand years.

Some papyri have come to us from ancient Egypt and therefore also lasted for millennia, but in extremely particular conditions (sealed tombs in the desert). In European climates and under ideal conservation conditions, they have an estimated lifespan of around 300 years. It is worth noting that the disappearance of parchment as a support for information is due precisely to the advent of papyrus, cheaper, easier to write, more readable but less durable.

The advent of paper made things even worse; if some volumes of the past have survived many centuries, all modern paper production has a limited duration of a few decades, with extreme cases such as certain paperbacks from the 90s or newspaper, which was enough to leave in the sun to see it literally crumble. It's the fault of chemical additives and whiteners, used to improve their appearance, and inefficient washing processes.

We can summarize that there has been continuous progress between one support and another which has resulted in lower costs, better performance and worse lifespans. On the other hand, replacing inorganic and incombustible supports with organic and combustible supports could only worsen the duration of the information recorded there.

In the IT field there is no such long historical experience. It only starts from the 1950s, with punched cards (and incidentally I have a pack of them in perfect condition in a drawer, punched for my thesis in 1980).

Computer media, whether magnetic or optical, have performed much less brilliantly. Apart from the intrinsic technological obsolescence of reading/writing peripherals, which have become unobtainable or non-functional, which makes even media that would be well preserved unreadable, even magnetic tapes and CD-ROMs, which boasted a shelf life of 30 years, have become actually turned out to be much more delicate than expected. A data transfer campaign carried out in person from CD-Rs less than twenty years old stored in ideal conditions led to almost 10% of media with more or less serious reading problems.

The sad truth is that the development of modern information technology has always favored the reduction of the unit cost of the media, the density of the information recorded therein, the speed of access to the information itself, without paying equivalent attention to the duration of the media themselves.

And this may be enough to explain why the duration of the supports, starting from the 20-30 years of the 1960s, has not improved but rather has, if anything, worsened. In fact, we are not talking about systems equipped with redundancy and correction algorithms; these systems must be dynamic, consume energy and are still subject to cybersecurity problems and poor resilience to disasters.

What is needed are media that reliably store information for their intrinsic stability and durability, and in a completely passive manner, without consuming energy, either directly, like a string of RAID disks that must be powered and functioning to be stable, nor indirectly, due to expensive production processes and/or the need for active conservation systems, such as air conditioning/heating for temperature stabilization.

And we also need supports in which the representation of the data is not so "far" from the perception of the users. Most digital data read/write drives produce media on which the data is imperceptible by normal means, and can only be detected with a particular type of hardware drive.

Both of these characteristics are present in the solution which currently guarantees the longest preservation times among the products available on the market. And, curiously, but perhaps not coincidentally, this is a fairly old technology, to which some improvements have been made. We are talking about "normal" photographic films, i.e. silver halide, and in particular the one used by Piql, a Norwegian company, together with the machinery for recording digital information.

The format of the film, which is a commercial product, is the normal 35 mm, the support used is a type of polyester, and the gelatin and emulsion obviously have particular characteristics. The life of this film, properly exposed and developed, is estimated to be up to 500 years, stored at room temperature and in optimal conditions.

Writing data onto the film, which in the end is still normal photographic film, can take place in various ways, both visual and encoded.

“Analog” digital data such as images and microfilm can be inserted normally. Pure digital data is instead encoded in frames similar to QR codes that each contain a block of data.

The fact that the coding is "visual" makes it possible to carry out the decoding, once the coding method is known, even without the original equipment, using an object that performs high resolution scans and a computer, equipped with appropriate software, that reassembles the scans in the original digital files.

Finally, about a kilometer of film is placed in a container specially designed for long-term storage,

The storage period is further extended by decreasing the storage temperature…

... but for today we have already gone a little long, and so here we update to the next episode of Archivismi.

Marco Calamari

Write to Cassandra — Twitter — Mastodon
Video column “A chat with Cassandra”
Cassandra's Slog (Static Blog).
Cassandra's archive: school, training and thought

This tag @loyal alternatives is used to automatically send this post to Feddit and allow anyone on the fediverse to comment on it.

Join communities

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord




If you have found errors in the article you can report them by clicking here, Thank you!