Skip to content

Add ArchiveBox community app#4569

Draft
pirate wants to merge 20 commits intotruenas:masterfrom
pirate:add-archivebox
Draft

Add ArchiveBox community app#4569
pirate wants to merge 20 commits intotruenas:masterfrom
pirate:add-archivebox

Conversation

@pirate
Copy link
Copy Markdown

@pirate pirate commented Mar 11, 2026

Closes #231

Description

ArchiveBox is a powerful, self-hosted internet archiving solution to collect, save, and view websites offline. It saves pages as HTML, PDF, screenshots, WARC, and more from URLs, bookmarks, browser history, RSS feeds, or any other source.

App Information

  • Train: community
  • App Version: 0.7.3
  • TrueNAS App Version: 1.0.0
  • Containers:
    • archivebox - Main web application (archivebox/archivebox:0.7.3)
    • sonic - Optional Sonic full-text search backend (archivebox/sonic:latest)

Features

  • Web UI for browsing and managing archived content
  • Admin user creation on first run (configurable username/password)
  • Optional Sonic search backend for fast full-text search
  • Configurable public/private access controls
  • Additional environment variables passthrough for advanced configuration
  • TrueNAS-managed dataset storage (ixVolume) or host path support

Testing

  • basic-values.yaml test file created with all configuration options
  • App structure follows community app conventions (based on linkwarden reference)
  • Health check configured (curl on web port)
  • Storage configured for data persistence
  • Portal configured for TrueNAS UI access

Icons and Screenshots

Icon: Please use the ArchiveBox logo from https://archivebox.io - a square orange icon with a box/archive symbol. I can provide specific assets if needed.

Special Notes

  • ArchiveBox runs as root inside the container (required for Chromium-based archiving)
  • The ADMIN_USERNAME and ADMIN_PASSWORD environment variables only take effect on first run to create the initial admin user
  • When Sonic search is enabled, an internal network connects the archivebox and sonic containers
  • Users can pass any ArchiveBox configuration option via the "Additional Environment Variables" section (see https://github.com/ArchiveBox/ArchiveBox/wiki/Configuration)

Checklist

  • App works with test files
  • app.yaml metadata is complete and accurate
  • questions.yaml has clear labels and descriptions
  • All test files are present
  • README.md is written
  • Only files under /ix-dev/ are modified
  • No auto-generated files are included

pirate and others added 4 commits March 11, 2026 02:36
ArchiveBox is a self-hosted internet archiving solution that saves
websites as HTML, PDF, screenshots, WARC, and more. This adds it
to the TrueNAS community app catalog with optional Sonic full-text
search backend support.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Expose CSRF_TRUSTED_ORIGINS explicitly (defaults to localhost:{port})
so login/API works when accessed via TrueNAS hostname. Also add
TIMEOUT, CHECK_SSL_VALIDITY, and SAVE_ARCHIVE_DOT_ORG as first-class
config options in the UI.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Expose these as explicit config options in the TrueNAS UI so users
can customize browser identity, pass authentication cookies, and
use persistent Chrome profiles for archiving authenticated content.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@stavros-k
Copy link
Copy Markdown
Contributor

Why is sonic added but not scheduler? Whats the reasoning?
Why all those optional envs added? And one of them is also hardcoded to *

@pirate
Copy link
Copy Markdown
Author

pirate commented Mar 11, 2026

Good call, scheduler should definitely be added, thanks for catching that. In the next release/dev it's already included in the main container, so I forgot that v0.7.x still needs it separate.

As for the optional envs, they are the most common options that most archivebox users will need to tweak. I just pushed a commit cut it down to a smaller set though, don't want to overwhelm users, they can always use env vars to set more.

I've also updated ALLOWED_HOSTS to share the value set in CSRF_TRUSTED_ORIGINS for now to remove one less var, they should almost always be the same in practice so I now popuplate both from a single public_url question variable.

Should I commit and push the autogenerated lib files in ix-dev/community/archivebox/templates/library/? Seems like a lot of diff noise to add to the PR so I just want to double-check.

@stavros-k
Copy link
Copy Markdown
Contributor

stavros-k commented Mar 12, 2026

Sheduler seems to fail to start currently.

❯ docker logs 95fd1ac99cf3c34b863024faa5eb8113-scheduler-1 -f
^[[A[i] [2026-03-12 17:39:16] ArchiveBox v0.7.3: archivebox schedule --foreground --update --every=day
    > /data

Traceback (most recent call last):
  File "/usr/local/bin/archivebox", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/app/archivebox/cli/__init__.py", line 140, in main
    run_subcommand(
  File "/app/archivebox/cli/__init__.py", line 80, in run_subcommand
    module.main(args=subcommand_args, stdin=stdin, pwd=pwd)    # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/cli/archivebox_schedule.py", line 98, in main
    schedule(
  File "/app/archivebox/util.py", line 116, in typechecked_function
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/app/archivebox/main.py", line 1183, in schedule
    cron = CronTab(user=True)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/crontab.py", line 246, in __init__
    self.read(tabfile)
  File "/usr/local/lib/python3.11/site-packages/crontab.py", line 314, in read
    raise IOError(f"Read crontab {self.user}: {process.stderr}")
OSError: Read crontab archivebox: crontabs/archivebox/: fopen: Permission denied

@pirate Any ideas?

Comment on lines +16 to +24
- variable: TZ
group: ArchiveBox Configuration
label: Timezone
schema:
type: string
default: Etc/UTC
required: true
$ref:
- definitions/timezone
Copy link
Copy Markdown
Author

@pirate pirate Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probs remove this as archivebox does not support any timezone other than UTC, it uses the browser's timezone to translate UTC to local times in the frontend, but backend is always UTC.

@pirate
Copy link
Copy Markdown
Author

pirate commented Mar 13, 2026

thanks for helping with this!

/data has to start with rw 775 at least, or ideally owned by uid&gid==911, and the docker volume has to be :z so both the archivebox container and scheduler container can share it. Not sure how TrueNAS creates the default ixvolume ownership. The scheduler container might be trying to read the crontab for user 568 which doesn't exist in the container. If it has to be 568 then we have to set PUID and PGID env vars so the container knows to use that value instead of the default 911.

  - variable: run_as
    label: ""
    group: User and Group Configuration
    schema:
      type: dict
      attrs:
        - variable: user
          label: User ID
          description: The user id that ArchiveBox files will be owned by.
          schema:
            type: int
            min: 568
-           default: 568
+           default: 911
            required: true
        - variable: group
          label: Group ID
          description: The group id that ArchiveBox files will be owned by.
          schema:
            type: int
            min: 568
-           default: 568
+           default: 911
            required: true

Let me know if you want me to push this change ^ and also update the test values / ix values to 911.

Also we definitely need to add public_url or ALLOWED_HOSTS+CSRF_TRUSTED_ORIGINS back to the default questions because the app is not usable at all without those values set correctly, all logins will result in a CSRF error and * is not allowed for those.

@stavros-k
Copy link
Copy Markdown
Contributor

stavros-k commented Mar 16, 2026

Volumes are shared in multiple apps and I have not needed to add :z so far, at least in TrueNAS.

I've tried with 911 and it still the same on scheduler. The main container doesnt have this issue. But I thought that the PUID/PGID can be set to any uid. (thats usually the concept of running the container as root + utilizing PUID/PGID).

@pirate Any ideas?

@stavros-k stavros-k marked this pull request as draft March 16, 2026 17:42
@stavros-k
Copy link
Copy Markdown
Contributor

@pirate ping.

@pirate
Copy link
Copy Markdown
Author

pirate commented Apr 7, 2026

Hey sorry I'm focused on getting the next archivebox v0.9.0 release out, which simplifies a lot of this stuff. Is it ok to leave this as a draft for now and I'll update it with the new config once 0.9.0 is ready?

@stavros-k
Copy link
Copy Markdown
Contributor

Sure yea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Community Chart Suggestion: ArchiveBox

2 participants