internet-archiving

Here are 27 public repositories matching this topic...

ArchiveBox / ArchiveBox

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Updated Dec 4, 2024
Python

akamhy / waybackpy

Star

Wayback Machine API interface & a command-line tool

osint internet-archive web-archiving wayback-machine webarchiving cdx-api internet-archiving savepagenow archive-webpage archive-webpages wayback-machine-api wayback-machine-python

Updated Feb 26, 2024
Python

pirate / wikipedia-mirror

Sponsor

Star

🌐 Guide and tools to run a full offline mirror of Wikipedia.org with three different approaches: Nginx caching proxy, Kiwix + ZIM dump, and MediaWiki/XOWA + XML dump

html docker nginx wiki docker-compose mediawiki wikipedia archiving datascience kiwix zim wikipedia-dump wikipedia-mirror openzim xowa internet-archiving mwdumper kiwix-offline-wikipedia

Updated Apr 7, 2021
Shell

ArchiveBox / good-karma-kit

Sponsor

Star

😇 A Docker Compose bundle to run on servers with spare CPU, RAM, disk, and bandwidth to help the world. Includes Tor, ArchiveWarrior, BOINC, and more...

docker docker-compose ipfs distributed-computing tor distributed-storage sia boinc kiwix i2p foldingathome storj pywb internet-archiving archivebox good-karma archivewarrior zimfarm

Updated May 11, 2024

ArchiveBox / archivebox-browser-extension

Sponsor

Star

Official ArchiveBox browser extension: automatically/manually preserve your browsing history using ArchiveBox.

chrome-extension archiving svelte firefox-extension browser-extension web-archiving digital-preservation digipres internet-archiving archivebox

Updated Nov 27, 2024
TypeScript

ArchiveBox / electron-archivebox

Sponsor

Star

Desktop Electron app for ArchiveBox internet archiver. (ALPHA: not ready for general use)

electron windows macos linux docker gui desktop web-archiving digipres internet-archiving archivebox desktop-electron

Updated Feb 28, 2023
JavaScript

vegetableman / vandal

Star

Navigator for Web Archive

chrome-extension firefox-addon wayback-machine webarchive internet-archiving

Updated Nov 23, 2023
JavaScript

mikwielgus / forum-dl

Sponsor

Star

Scrape posts, threads from forums, news aggregators, mail archives, export to JSONL, mailbox, WARC

python scraper forum discourse phpbb warc data-fetching simplemachines internet-archiving

Updated Jun 27, 2024
Python

pirate / internet-archiving-talk

Sponsor

Star

🎭 An introduction to the Internet Archiving ecosystem, tooling, and some of the ethical dilemmas that the community faces.

slideshow wget talks warc censorship web-archiving ethics internet-archiving archivebox

Updated Aug 15, 2024
JavaScript

ArchiveBox / docker-archivebox

Sponsor

Star

Home of the official docker image for ArchiveBox

docker kubernetes image docker-compose docker-image container oci digipres podman internet-archiving archivebox

Updated Oct 16, 2024

Own-Data-Privateer / hoardy-web

Star

Passively capture, archive, and hoard your web browsing history, including the contents of the pages you visit, for later offline viewing, mirroring, and/or indexing. Your own personal private Wayback Machine that can also archive HTTP POST requests and responses, as well as most other HTTP-level data.

cli backups internet archiving snapshot self-hosted archive browser-extension archiver web-archiving wayback-machine web-browsing web-archive website-archive auto-save offline-reading internet-archiving

Updated Nov 30, 2024
Python

ArchiveBox / readability-extractor

Sponsor

Star

Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page's article text.

wrapper node readability internet-archiving archivebox

Updated Sep 16, 2024
JavaScript

ArchiveBox / abx-dl

Sponsor

Star

⬇️ A simple all-in-one CLI tool to download EVERYTHING from a URL (like youtube-dl/yt-dlp, forum-dl, gallery-dl, simpler ArchiveBox). 🎭 Uses headless Chrome to get HTML, JS, CSS, images/video/audio/subtitles, PDFs, screenshots, article text, git repos, and more...

cli chrome downloader curl headless scraping crawling http-client youtube-dl wget cli-tool puppeteer internet-archiving playwright archivebox yt-dlp gallery-dl ai-scraping

Updated Nov 25, 2024
Python

ArchiveBox / homebrew-archivebox

Sponsor

Star

Homebrew formula for the ArchiveBox self-hosted internet archiving solution.

macos homebrew package linuxbrew web-archiving digipres brew-tap internet-archiving archivebox

Updated Oct 5, 2024
Ruby

ArchiveBox / archivebox-proxy

Sponsor

Star

Official ArchiveBox MITM proxy: saves URLs of all requests passing through to an ArchiveBox server for archival.

proxy https-proxy web-archiving web-proxy digital-preservation mitmproxy digipres internet-archiving archivebox

Updated Jul 12, 2024
Python

ArchiveBox / debian-archivebox

Sponsor

Star

Home of the official apt/deb package for Ubuntu/Debian-based systems.

package debian apt ubuntu web-archiving aptitude digipres internet-archiving archivebox stdeb

Updated Oct 5, 2024
Python

ArchiveBox / DigestBox

Sponsor

Star

DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.

backups warc web-archiving digipres headless-browser internet-archiving archivebox