Archivismi: l’organizzazione dei documenti in Internet Archive

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord

Archivismi: the organization of documents in the Internet Archive

This post was last updated by 6 months does

This is a text automatically translated from Italian. If you appreciate our work and if you like reading it in your language, consider a donation to allow us to continue doing it and improving it.

The articles of Cassandra Crossing I'm under license CC BY-SA 4.0 | Cassandra Crossing is a column created by Marco Calamari with the "nom de plume" of Cassandra, born in 2005.

Fifth episode of archivisms, since there will still be several for this period we will insert a couple of Cassandras a week so as not to fall too far behind!

This article was written on December 28, 2023 from Cassandra

Cassandra Crossing 562/ Archivismi: the organization of documents in the Internet Archive

Let's complete the description of how the Internet Archive organizes documents, and how the site allows them to be used

In the last episode we managed to archive documents, even large ones and in heterogeneous formats, and converting them during the process so as to have them available in multiple digital formats, reusable for the most diverse purposes.

But to be able to say that I really have archived a document must also have been inserted into a larger body of documents, which in turn is equipped with various types of indexes and methods for organizing and searching the documents and the information contained in them.

It is therefore easy to understand the importance of knowing first like an existing digital library it allows you to organize your data, adapting to useful and well-studied common standards.

The architecture of Internet Archive it is as simple as it is powerful.

The first level of architecture is the object, which can be created and subsequently modified in various ways; an object is typically a single document. If the object is created by a registered and logged in user, the user is assigned the role of administrator of the object, who can then modify it, enrich it with additional data files and new metadata, and so on. If the object is created anonymously by an unregistered or unconnected user, for example using the Wayback Machine, it can no longer be modified by the person who created it, but only by Internet Archive administrators, upon specific request to be forwarded via email, formatted with specific templates.

The second (and last!) level of architecture is the collection (Collection). A collection is an object of a particular type, made up only of references to other objects. Like all objects, it is equipped with its own metadata, but it can only be created by Internet Archive administrators upon specific request from a registered user, a user who must meet certain requirements, listed in the collection creation policy. A collection can contain other collections as sub-collections. The user who has had the collection created and assigned can administer it by inserting the objects of which he is the creator, for example those he has uploaded.

When an object is created, it is assigned to a collection by default; if the object is created anonymously or directly by a user via upload, it is automatically assigned to a collection that we could define as "system".

For example, the documents we created in the previous episodes, as can be seen by examining the metadata in the object window or via the metadata editor, have been assigned by default to the collection "opensource”. You will remember that the used item file was specifically marked by us as ephemeral object and intended to be deleted after 30 days. Looking at its metadata, you can see that it has also been assigned to the collection test_collection. An automatic process evidently "brushes" all the objects assigned to this collection and definitively removes those older than 30 days.

There is a pseudo "third level" of organization which is only "presentational", and is built by the creators of the site by assigning objects to particular collections and then using them to generate specific pages on the Internet Archive site, to facilitate rapid and extemporaneous access to certain categories of information. These are, for example, the icons found on the home page and on the menu bar of the site.

The Internet Archive site has a somewhat "cluttered" and retro feel. In fact, however, once you gain a minimum of confidence, it turns out to be a quite useful and powerful mechanism for finding documents of interest or getting ideas for new things, which are usually very popular collections.

In reality, however, the information of interest can be found, as is easy to imagine given that it is a library, through the indexing and search functions, made available in various ways on the site. For example, when viewing your uploads, on the left side of the screen you have access to a series of relevant selection categories, similar to those on Amazon.

When necessary, you can directly access the search function via the “Search” at the top right of the site. You can access the complete search function by clicking inside the box itself and selecting "advanced search”.

And that's all for today too. Stay tuned for the next episode of “Archivists”.

Marco Calamari

Write to Cassandra — Twitter — Mastodon
Video column “A chat with Cassandra”
Cassandra's Slog (Static Blog).
Cassandra's archive: school, training and thought

Join communities

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord




If you have found errors in the article you can report them by clicking here, Thank you!

Comments

Each article corresponds to a post on Feddit where you can comment! ✍️ Click here to comment on this article ✍️

Feddit is the Italian alternative to Reddit managed by us, based on the software Lemmy, one of the most interesting projects of fediverse.