Alternative ad Archive.org, decentralizzate e non

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord

Alternatives to Archive.org, decentralized and otherwise

This post was last updated by 1 week does

This is a text automatically translated from Italian. If you appreciate our work and if you like reading it in your language, consider a donation to allow us to continue doing it and improving it.

Today's article is a very delicate and particular one for several reasons. Delicate because we are not talking about alternatives to a Big Tech project but about a grandiose project which is Archive.org and we don't have many doubts about this. This article about alternatives to Archive.org It was also born from our need and for the first time this article was born from one discussion initially took place on our new forum.

Our need was to provide an additional archive to the sources of our articles beyond Archive.org. Not because we want to replace Archive.org, obviously, but to also have an alternative at hand just in case.

Alternatives to Archive.org, decentralized and otherwise

And why look for alternatives to Archive.org? Well, first of all because it is an important and fundamental project and therefore it would be a good thing to have more than one so as not to leave the entire workload on their shoulders. Another reason is that Archive.org is not infallible nor invincible but it is nevertheless a fundamental part of the internet. Internet Archive, in addition to costing around 25 million euros a year 1, has had, and will probably have in the future, some serious accusations that essentially want to shut it down 2 3 4.

We must therefore all unite to help Internet Archive to live too through donations (if you noticed there are no banners or advertisements on the site).

In the meantime, a discussion on how to help the Internet Archive to "not be alone" and how to also help posterity who must now rely on the sole hope of the Internet Archive and its servers (over 200 PetaBytes of data) is useful. 5) nothing ever happens. Decentralization is a hot topic right now and in this space. From what we know, the Internet Archive itself is potentially interested in this path 6 7.

We, who are very curious, therefore went looking for alternatives to Archive.org and we found a little something.

Centralized alternatives

They are the alternatives "like" Archive.org or with centralized servers that keep everything. It is certainly the most classic of ways as well as the simplest.

  • archive.today it is perhaps the best known and most used Internet Archive alternative of all. However, there are some problems: it is not clear who is behind this project, it is not a non-profit like Internet Archive but it is, apparently, the simple project of a single person. Archive.today makes requests to external sites such as Google And mail.ru and in order to archive content it is necessary to pass a CAPTCHA (from Google) without the possibility of logging in or doing anything else to avoid it. It has several mirror domains (i.e. which always point to the same content) but some are blocked by various firewalls/blocking lists for illegal and/or dangerous content 8. In short, it will also be the most famous alternative and we too have used it often but we would like to move away from it a little.
  • ArchiveBox 9, is probably the one we like the most as an idea. It's a sort of very easily self-hostable Internet Archive. It is therefore certainly useful for creating personal archives and it could also be interesting if a sort of list of servers were born that offer it to everyone (a bit like what happens with Invidious, so to speak) although the reliability would probably be very low because it is not designed to reproduce on multiple platforms so once a server dies or disappears... the content dies with it. Interesting therefore but it must be used by expert hands and something interesting could arise especially if automatic tools are implemented.
  • Conifer by Rhizome 10, another very interesting alternative. Completely Open source, it can also be self-hosted. 500MB is offered for free and then you can upgrade to a fairly high paid plan (€20 per month for 40GB!). The operation is quite similar to Archive.org and allows you to archive web pages very easily. Unfortunately the project seems a bit abandoned to itself, the Repositories it has not been updated since far 2021 and in issue every now and then we read that no one responds anymore and that the site sometimes dies for days without it being possible to contact anyone.
  • Perma.cc it is the first alternative listed which is purely commercial in nature. Perma.cc is fast and well made but requires a fee. It's not even particularly cheap even though it's free for academic use and even if used by law courts (we don't know if only Americans or from all over the world). We have tried it and the current cost is 10$ monthly for 10 links per month, 25$ for 100 links and 100$ monthly for 500 links each month. Alternatively, you can purchase packages of individual links: 10 extra links cost 15$, 100 extra links 30$ while 500 links cost 125$. A commercial and not so cheap alternative which can be useful on some important occasions but which cannot be used to store data on a daily or almost daily basis.
  • There would also be Megalodon but it's all in Japanese and definitely not suitable for a European audience but it still does its job: https://megalodon.jp/2024-0404-2218-00/https://www.lealternative.net:443/.

Decentralized alternatives to Archive.org and other methods for archiving a page

After reviewing the centralized alternatives we come to the decentralized ones. Decentralization is undoubtedly an interesting option when it comes to archiving files or websites as it is more complex to delete an archive and the burden is distributed over a global network and not on individual cloud servers.

  • Archive the Web 11 it is definitely an interesting alternative and also the one that we would have liked to use if it were not for its excessive cost and the difficulty in purchasing. It is based on cryptocurrency Arweave which allows you to archive files with decentralization through the blockchain. Basically you can select the URL to archive and, based on the weight of the page, you pay in AR cryptocurrency with an average of €0.19 per page. The price is not fixed and changes based on the value of the cryptocurrency on the market. Nowadays the price is certainly excessive unless you need a one-off payment but even in this case there is a big obstacle: the only accepted payment method is the Arweave cryptocurrency so if you are not familiar with wallets, cryptocurrencies and the like are best left alone. Unlike other projects based unnecessarily on blockchain (who knows if the ears are ringing at HANDLE (Archive | Arweave) in this case it might make some sense as a whole.
  • IPFS, another alternative that is certainly interesting but still too technical and not suitable for the general public. It is also integrated within Brave Browsers and allows you to decentralize web pages. In practice the pages are saved and distributed through the IPFS protocol in a similar way to what BitTorrent does. With the Browsers Brave you can become a node of this network or you can use external gateways which, however, unfortunately are often part of the network Cloudflare. Also, from what we understand, the files must be finned to be kept online and there are sites that can do it for us, such as Pinata, which however have exorbitant costs. As mentioned, it is certainly a fascinating option but it is still very much linked to technical difficulties for beginners and it is not always clear how to make a page "visible to everyone on IPFS". For example, in addition to large projects, such as English Wikipedia on IPFS via DNSLink: https://en.wikipedia-on-ipfs.org/wiki/, all the links we found on the web pointing to an IPFS page in articles from a few years ago are all gone. And that's not very promising.

Save pages offline (and upload them to external decentralized clouds)

The last solution, which is the one we are also trying to follow, is to save the pages offline and then upload them to a decentralized cloud. Why decentralized? For the previous question: we are looking for a method to keep a file alive even after any server problems and even after the death of a service (or ours 🤘). So for this particular archiving need we are looking for a decentralized method that allows for easy distribution and difficult destruction.

So let's see how you can save an internet page locally.

  • Save the web page thanks toextension Open source SingleFileZ which creates a sort of .html + .zip that self-extracts when opened. We tried on several Browsers and it seems to work well. In practice it is as if the entire site were in a single .html file. The only problem is that in this way it is not always possible to have a preview of the site but sometimes it is necessary to open the HTML with a Browsers any. It depends a bit on where you plan to host it. Here is an example made by us: https://jswqquqd2yxc4fxtaqwqdx32fwqehsxbiqs5w3lb7lgap6xlikta.arweave.net/TK0IUgPWLi4W8wQtAd96LaBDyuFEJdttYfrMB_rrQqY. Not bad, right?
  • Come on Browsers Chromium you can save pages in .mhtml. This extension however is not supported by Browsers Firefox nor have we found extensions that allow reading this file with Firefox. Also for this reason we excluded it from our choices.
  • On Web recorder you can find a series of useful tools for saving web pages and a community that can help you in case of problems.
  • Download the page offline via programs like Httrack 12 or Wget. In this case the page will be saved entirely with all the files and therefore a little difficult to "carry around".
  • Save the page as a PDF or via a screenshot for example with ShareX. This method is very practical if we are talking about a simple static page with text and if we do not need to save dynamic content such as videos or animated images, for example. To save pages in PDF there are many ways, one of them is to do "print - Save to PDF" on any Browsers. The convenience of this format instead of using Httrack or Wget is to have a single file and not hundreds. The inconvenience is that of having a .PDF perhaps with difficulty in searching for the text and which does not always come out well. The inconvenience of the screenshot is that you can't do anything with it and it could also be complicated to read. Storing text in an image is definitely not a good idea normally.

And what do I do with it after saving this file?

A bit of everything you want. Once you have chosen how to save the file locally, be it a .zip, .png, .zip.html, .mhtml file or the entire content of the site, we thought it might be useful to put it in a decentralized archive.

Later we will do an article on decentralized alternatives to Google Drive such as Storj or Lifecoin, for this article in particular we have chosen to talk about Arweave.

Arweave it is in fact the one we are trying to use these days. Files are shared through the blockchain which, as mentioned, unlike other projects could make sense in this case. It is associated with cryptocurrency Arweave and, this is the most interesting part, allows you to save a file permanently.

Obviously we are not naive and we know that permanent is an important word and nothing can be permanent in computer science without it being physically somewhere in some way. Arweave for now it seems well structured and the project well underway with hundreds of nodes and currently more than 110TB of data already uploaded 13. The files are therefore tied to the blockchain and can be always recovered, at least as long as there are at least a few people who they work on this coin. According to their estimates, even if the 90% of the nodes were to disappear, at least 15 copies of our data would still exist 14, this therefore makes us think that it could be an interesting project to store long-term data which is what interests us in this context.

Obviously as anything can die from one day to the next, we wouldn't use it as a primary resource but right now we are grappling with the idea of using it as an alternative and support resource.

The file on the blockchain

An example to understand us better. We uploaded this HTML to the blockchain: https://jswqquqd2yxc4fxtaqwqdx32fwqehsxbiqs5w3lb7lgap6xlikta.arweave.net/TK0IUgPWLi4W8wQtAd96LaBDyuFEJdttYfrMB_rrQqY. If the arweave.net domain were to die tomorrow, our file is still safe because the file is not physically on arweave.net but is in the decentralized network and arweave.net only acts as a gateway, i.e. an "access route". So in case arweave.net were to die just change the gateway (which can be created by anyone) and the file will always be reachable by only changing the domain so it can be: https://jswqquqd2yxc4fxtaqwqdx32fwqehsxbiqs5w3lb7lgap6xlikta.ar-io.dev/TK0IUgPWLi4W8wQtAd96LaBDyuFEJdttYfrMB_rrQqY or https://jswqquqd2yxc4fxtaqwqdx32fwqehsxbiqs5w3lb7lgap6xlikta.permagate.io/TK0IUgPWLi4W8wQtAd96LaBDyuFEJdttYfrMB_rrQqY and so on.

And how do I upload the files?

There are several providers that allow you to do this. Obviously they are paid because the concept is precisely that of paying someone to decentralize our data and keep it "forever". There is for example ArDrive or what we have chosen which is Akord and which also allows you to have 100MB free to try it out.

Once the file has been uploaded to permanent storage, the file can be downloaded share through their site (which is user-friendly but less secure because if they disappeared all the links would be 404 and difficult to recover) or, better, you can share the blockchain link complete with the transition ID: https://yoejsggd72pocx24w5wxeq5jmrzczo5d6prqel7yfdeyr4fgwm2a.arweave.net/w4iZGMP-nuFfXLdtckOpZHIsu6Pz4wIv-CjJiPCmszQ. In this case, as we have seen, even if arweave.net were to ever disappear, it will be enough to redirect the connections to any other gateway.

How do I delete files?

Here comes the fun part: once you upload a file it cannot be deleted. You cannot physically delete it and even if you were to unsubscribe or delete your account this file would disappear because, as mentioned, it is not on a site or on a cloud but it is everywhere and Arweave was specifically designed to do not be canceled.

However, a single node can eventually delete the file. Let's say that China forces all the miner of this cryptocurrency to eliminate that transaction. It would be possible, however the file would still be around as long as it is redistributed in other parts of the world as well. The only way to get the entire network to delete content is to have more than 50% nodes agree to delete it: this is designed to prevent illegal and ethically unsustainable content from being uploaded anywhere in the world 15.

Alternatives to Archive.org, decentralized and otherwise: conclusions

In this long article we have therefore discovered some alternatives to Internet Archive that do more or less the same job but on different servers. We then discovered decentralized alternatives and finally tried to save the content of a page locally, or even any content, and then redistribute it around the world, thus trying to keep it alive forever.

We have been trying to archive our external sources lately both on the Internet Archive and in the Arweave network, let's see how it goes!

This tag @loyal alternatives is used to automatically send this post to Feddit and allow anyone on the fediverse to comment on it.

  1. Internet Archive | Archive | PDF[]
  2. Internet Archive Files Final Reply Brief in Lawsuit Defending Controlled Digital Lending | Archive | PDF[]
  3. If there is a book on Internet Archive your interested in, GO DOWNLOAD IT NOW. Also PLEASE stop using the IA as the sole host for preservation projects. | Archive | Arweave[]
  4. Let Readers Read | Archive[]
  5. Petabox | Archive | PDF[]
  6. Arweave + Internet Archive: Building a verifiable record of history | Archive | PDF[]
  7. What Information Should we be Preserving in Filecoin? | Archive | PDF[]
  8. Report on Mastodon | Archive | PDF[]
  9. source code ArchiveBox[]
  10. source code Conifer[]
  11. source code Archive the Web[]
  12. source code Httrack[]
  13. How much data can you store on a blockchain? | Archive | Arweave[]
  14. What makes Arweave immune to these problems? | Archive | Arweave[]
  15. What happens if illegal or malicious content is uploaded? | Archive | Arweave[]

Join communities

Logo di Feddit Logo di Flarum Logo di Signal Logo di WhatsApp Logo di Telegram Logo di Matrix Logo di XMPP Logo di Discord




If you have found errors in the article you can report them by clicking here, Thank you!

By skariko

Author and administrator of the web project The Alternatives