Archivismi: upload ed operazioni elementari

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord

Archiving: uploading and elementary operations

This post was last updated by 6 months does

This is a text automatically translated from Italian. If you appreciate our work and if you like reading it in your language, consider a donation to allow us to continue doing it and improving it.

The articles of Cassandra Crossing I'm under license CC BY-SA 4.0 | Cassandra Crossing is a column created by Marco Calamari with the "nom de plume" of Cassandra, born in 2005.

We really appreciate these articles on archive.org, for those interested it also exists archive.is which we often use as links for link archives.

This article was written on December 26, 2023 from Cassandra

Cassandra Crossing 560/ Archivismi: uploading and elementary operations

Archive it doesn't mean memorizing. Archive it doesn't mean copying.

Archive, in the digital world and in the broadest sense of the term, means storing digital information in a meaningful form, and doing so in the most appropriate formats, accompanying it with the most suitable metadata and a well-selected set of search keywords. And do this by following the consolidated procedures and methods used by generations of librarians today digital but also before analogue, they have already arranged for us.

In this third episode of Archivists we will discover that archiving on Archive.org is not as simple and immediate as copying an ebook or an mp4 video to Dropbox, Google Disk or a Nextcloud server.

Some operations are, fortunately, almost completely automatic; as we saw in the last episode, archive a single page on The Wayback Machine it is actually an elementary operation, even if a little slow. And in fact it is slow because it uses a complex infrastructure, archiving the page with a mechanism designed to also allow much more sophisticated operations.

Let's see what it's about. In the database of Internet Archive information is stored in objects. Each object, upon creation, is associated with a unique identifier. An object, to all intents and purposes, can be represented as a directory, which contains at least one data file and at least two metadata files.

Let's try to create an object by doing a simple upload, like those used to upload a file to a cloud.

To continue, you must have created your user Internet Archive; If you haven't already done it, do it now and then log in with your user.

Immediately look at the little heart 💓 in the center of the menu bar; by clicking on it you can make one small donation with any means of payment you have available. It is obviously not mandatory, the Internet Archive services are free, as it should be in any place universal library, but it costs them money to run the place, so, as usual, TANSTAAFL.

However, if for now it doesn't seem like the service of Internet Archive worth your pennies, go ahead; you will probably change your mind soon.

Look at the UPLOAD link at the top right; parenthetically note, and we will see this many times, that the Internet Archive hides the most important links in the least visible places, but it must be adark art common among digital librarians…

If you click on it, a window obviously opens in which you can drag & drop a file or open a more practical file selection window. To follow this example, select a .pdf file, or whatever you want.

Once you have made your selection, the most important window of all will open, that of archiving.

First of all don't infest the Internet Archive with your evidence; although it is possible to delete an object, in reality it is not normally removed from the database, but continues to exist for a variety of reasons. It could be removed later, but this requires a "systems" intervention. We'll come back to it later.

For convenience, a specific parameter is therefore provided, which marks an object as ephemeral, and automatically and completely deletes it after 30 days. You have a duty to be good, then use this parameter for all test archives, so that only the final ones are stored permanently.

Then select it immediately from the appropriate drop-down menu. We then observe that, obtaining them from the file name, reasonable values are proposed, such as object title and identifier (URL).

Let's correct them, if we deem it necessary, and add a description, keywords, and anything else useful.

All fields are optional, but careful and thoughtful use of them is the essential factor in a good archiving campaign. AND "design it” is also the most difficult thing, so for now let's be satisfied. We'll talk again.

You can now click on the blue button below.

A normal upload window opens which, after what seems like too long a time (minutes), will give you back a view on the object you just created. This window includes a Browsers for the newly created object, but now you will probably only see a white box, because in reality the archiving process is still in progress.

Examining it carefully you will notice a whole series of clickable links, but first one important thing.

Depending on what Browsers and operating system you use, by navigating back with the arrow on the left (you can do it easily) it could happen that, in addition to displaying the previous page, the file download window opens; in this case you can easily cancel/close it and continue. When Cassandra finds a way to avoid this annoyance, she will certainly let you know.

Your test object has not yet been fully created; it exists as an identifier and as basic information, and can therefore already be used, but many operations in the Internet Archive backend still need to be performed, and will be in the next few minutes, hours or days. So, once again, patience.

But what operations are these? It depends on the type of object you created, and in which “collection” you entered it. Let's leave aside for now the very important aspect of collection, and let's focus on the automatic operations that have been scheduled and that are or will be performed on the newly created object. You can examine them using the link history in the microscopic menu at the top left of the window object.

There is a running task; this is the archiving of the object that proceeds, while in the lower part of the window the history of the operations performed automatically on the object appears and continues to populate; in fact after half an hour this will appear.

Many things will continue to happen to our object in the backend, and we'll talk more about that; in the meantime let's go back to the object window and click on in the microscopic menu at the top left manage.

Two large icons will appear; the one on the left (very important) allows you to edit the metadata, but for now it is blocked by the ongoing object creation process, and if you click on it it will explain why.

The one on the right instead allows you to edit the files contained in the object, and if you click it it will open a view of the folder and its contents. Depending on the time passed and the file you have archived, you will find different contents, and many more files than you would expect.

You'll notice the original .pdf file that was used in this example, two .xml files and one .sqlite, which contain system information (and like many other things, more on that). There is a new .torrent file, which can already be downloaded and used to provide a link torrent, useful if the uploaded file is very large and you need to have many people download it.

Finally, there are several files, partly still indicated in gray and inaccessible, which they testify the ongoing operations that Internet Archive is doing for you, which depend on the type of file you have archived.

For example, a text-only file will automatically be created from our .pdf file, containing all the text present in the PDF. Again in the case of a PDF, an index of the pages will be created. If it had instead been a video file, among other things a directory would have been created containing 255 thumbnails, uniformly extracted from the entire length of the video, which can be used to display it as an object video (for example in one timeline). More files will be created, but we are nearing the end of this intense episode.

Why …that's another story.

But one last thing. Always from the peculiar micromenu at the top left of the window object you can access the link that opens the windowitem manager, where you can manage the created object in multiple aspects.

Some of the 24 indomitable readers, the most interested and endowed with time and initiative, will be able to start from here or from the other windows we have seen for an exploration alone, which can last a very long time and take them very far.

Cassandra recommends that these bold people equip themselves with a bit of Python and familiarity with the API; for this purpose he suggests using the very well organized help page and give them this precious link to Internet Archive developer documentation.

As an example, if you wanted to explore the topic of files created automatically during an upload, you could read on this article of the help.

The others will instead wait for Cassandra, slow foot, carry out this exploration for them or together with them.

Stay tuned for the next episode of “Archivists”.

Marco Calamari

Write to Cassandra — Twitter — Mastodon
Video column “A chat with Cassandra”
Cassandra's Slog (Static Blog).
Cassandra's archive: school, training and thought

Join communities

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord




If you have found errors in the article you can report them by clicking here, Thank you!

Comments

Each article corresponds to a post on Feddit where you can comment! ✍️ Click here to comment on this article ✍️

Feddit is the Italian alternative to Reddit managed by us, based on the software Lemmy, one of the most interesting projects of fediverse.