Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .markdownlint.jsonc
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"default": true,

// Hard tabs
"MD010": { "spaces_per_tab": 4 },

// Line length
"MD013": false,

// Multiple top-level headings in the same document
"MD025": { "front_matter_title": "" },

// Inline HTML
"MD033": { "allowed_elements": ["a"] }
}
4 changes: 3 additions & 1 deletion docs/pre-consumption/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ paperless-ngx/

The pre-consumption "wrapper" just contains some helper variables and calls to your actual pre-consumption scripts.

<a name="pre-consumption-wrapper"></a>

```bash title="pre-consumption-wrapper.sh"
#!/usr/bin/env bash

Expand All @@ -51,4 +53,4 @@ python ${SCRIPT_DIR}/custom-script-02/actual-pre-consumption-task-02.py

## Examples

No examples yet :disappointed_relieved: [Want to add one?](../about/contributing.md)
Have a look at the examples via the navigation on the left side.
19 changes: 19 additions & 0 deletions docs/pre-consumption/decrypt-pdf.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
---
title: Decrypt PDFs
---

# Decrypt PDFs before consumption

Encrypted PDFs are one way to rain on your parade with paperless-ngx - even if you know the password. You get no thumbnail, no OCR, you are constantly nagged for the password when you want to view the file. So let's get rid of the encryption!

## Setup

_decrypt-pdf_ consists of

1. the script itself
2. the password files (insecure.pwd.txt and personal.pwd.txt)
* **personal.pwd.txt** is slated for your personal passwords, it's also added to the local _.gitignore_ file
* **insecure.pwd.txt** contains some of the most prolific passwords I could find
* you can create other files in the same manner to try passwords from dumps, "most common password in XXX" lists etc.
Comment on lines +15 to +17
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, it renders like this, enumerating the details on the password files:

grafik

But isn't it meant to render like this, with the details further indented? If so, please add a space before each unnumbered list item.

grafik

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that's funny because that's how it is written and rendered on GitHub and VisualStudio Code. Where is your output from?

2024-11-28 09_08_18-paperless sh_docs_pre-consumption_decrypt-pdf md at feat_pre-consumption_decrypt
2024-11-28 09_07_27-● decrypt-pdf md - fork_paperless sh - Visual Studio Code

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked out your code and ran mkdocs serve


Keep all of these in the same folder and install the script either as your one pre-consumption script or call it via other means, e.g. the [example pre-consumption consolidated wrapper script](./README.md#pre-consumption-wrapper).
1 change: 1 addition & 0 deletions scripts/pre-consumption/decrypt-pdf/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
personal.pwd.txt
92 changes: 92 additions & 0 deletions scripts/pre-consumption/decrypt-pdf/decrypt-pdf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
#!/usr/bin/env bash

# paperless-ngx pre-consumption script
# https://docs.paperless-ngx.com/advanced_usage/#pre-consume-script
#
# uses qpdf to test consumed pdf for encryption and try a list of pre-supplied passwords
# if one matches, attempts to remove encryption from file
#
# the user can supply the lists of passwords via *.pwd.txt files
# the file 'personal.pwd.txt' is reserved for the user's true passwords and guarded
# via .gitignore against unintentional disclosure
#

# Environment Variable Description
# DOCUMENT_SOURCE_PATH Original path of the consumed document
# DOCUMENT_WORKING_PATH Path to a copy of the original that consumption will work on
# TASK_ID UUID of the task used to process the new document (if any)

SCRIPT_PATH=$(readlink -f "$0")
readonly SCRIPT_PATH
SCRIPT_DIR=$(dirname "${SCRIPT_PATH}")
readonly SCRIPT_DIR
SCRIPT_NAME=$(basename "${SCRIPT_PATH}")
readonly SCRIPT_NAME
CONSUMABLE=$(basename "${DOCUMENT_WORKING_PATH}")
readonly CONSUMABLE
FILE_TYPE=$(file --mime-type --brief --no-pad "${DOCUMENT_WORKING_PATH}")
readonly FILE_TYPE


printf -- '--- %(%F %H:%M:%S)T | %s -------------------------------------\n' -1 "${SCRIPT_NAME}"

if [[ "${FILE_TYPE,,}" != 'application/pdf' ]]; then
printf '%s is not recognized as PDF file, nothing to do\n' "${CONSUMABLE}"

elif qpdf --is-encrypted "${DOCUMENT_WORKING_PATH}"; then

# shellcheck disable=SC2046
PWD_CORPUS=$(cat $(find "${SCRIPT_DIR}" -type f -iname '*.pwd.txt') | sort -u)
readonly PWD_CORPUS
PWD_COUNT=$(printf '%s' "${PWD_CORPUS}" | wc -l | cut -f 1 -d ' ')
readonly PWD_COUNT
printf 'password corpus with %u entries assembled\n' "${PWD_COUNT}"

printf '%s is encrypted, trying to decrypt... ' "${CONSUMABLE}"
decrypted=0

while IFS= read -r pwd_line || [[ -n "${pwd_line}" ]]; do

qpdf --requires-password --password="${pwd_line}" "${DOCUMENT_WORKING_PATH}"

# exit codes: see https://qpdf.readthedocs.io/en/stable/cli.html#option-requires-password
case $? in
3)
qpdf --decrypt --password="${pwd_line}" "${DOCUMENT_WORKING_PATH}" --replace-input
printf 'decrypted\n'
decrypted=1

printf 'password reminder:\n'
case ${#pwd_line} in
1-2)
printf 'less than 3 characters long\n'
;;

3-5)
printf 'starts with %s and ends with %s\n' "${pwd_line: 0: 1}" "${pwd_line: -1}"
;;

*)
printf 'has %u characters, starts with %s and ends with %s\n' "${#pwd_line}" "${pwd_line: 0: 1}" "${pwd_line: -1}"
;;
esac

break
;;

*)
continue
;;
esac
done < <(printf '%s' "${PWD_CORPUS}")

if (( decrypted != 1 )); then
printf 'failed\n'
printf 'no password entry matches\n'
fi

else
printf '%s is unencrypted, nothing to do\n' "${CONSUMABLE}"
fi

printf -- '--- %(%F %H:%M:%S)T -------------------------------------\n' -1
240 changes: 240 additions & 0 deletions scripts/pre-consumption/decrypt-pdf/insecure.pwd.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,240 @@
!@#$%^&*
0000
00000
000000
0010054006
0132559004
030408
102030
110110jp
1111
11111
111111
1111111
11111111
112233
121212
123
123123
123321
1234
12345
123451
123456
1234561
1234567
12345678
123456789
1234567890
1234567891
12345678910
123qwe
131313
159753
18atcskd2w
1leila1
1q2w3e
1q2w3e4r
1q2w3e4r5t
1qaz2wsx
2000
2710877
3rjs1la7qe
3xp3rt444
4802
555555
652626
654321
666666
6969
696969
777777
7777777
888888
987654321
9916691966@vV
9q5r35j
aa123456
Aa123456
Aa123456.
aa12345678
aaaaaa
aaron431
abc123
access
admin
admin123
adobe123[a]
Albert2001
amanda
andrew
asdf
asdfgh
ashley
asshole
austin
azerty
azertyuiop
bailey
baseball
batman
biskupn
biteme
bonjour
bundesanstalt06
buster
BvtTest123
charlie
cheese
chelsea
chouchou
cliff
col123456
computer
D1lakiss
dallas
daniel
Dextr1016
donald
doudou
dragon
Dragon
dubsmash
efvj2ti
flower
football
Football
freedom
frühling
Frühling
fuck
fuckme
fuckyou
fuk19600
g_czechout
george
ginger
google
guest
habitat
hallo
Hallo1234
harley
hasenmaus
hello
Herbst
herbst
hockey
hottie
hunter
iloveyou
Iloveyou
jennifer
jessica
jesus
jetaime
jordan
joshua
ka_dJKHJsy6
keineahnung
killer
klaster
letmein
login
loulou
love
lovely
loveme
maggie
marseille
master
matrix
matthew
meidericher
michael
michelle
Million2
minecraft
Monkey
monkey
mustang
mynoob
nicolas
nicole
niklas23
ninja
omgpop
optimist.3103
P@ssw0rd
pass
pass1
Pass@123
passw0rd
Password
password
Password1
password1
password123
Password1234567890
Passwort
pepper
photoshop[a]
picture1
Pitbull123
princess
pussy
qazwsx
qqww1122
qwerty
qwerty123
qwertyuiop
Qwertyuiop
ranger
regen
research
robert
root
rWJVHGmG
scheisspasswort
schwabea
senha
shadow
soccer
soleil
solo
Sommer
sommer
sonnenschein
starwars
stratfor
summer
sunshine
superman
Swordfish
swordfish
taylor
test
test1
thedancerfam
thomas
thunder
tigger
trustno1
wealth
welcome
Welcome1
wetter
whatever
wind
Winter
winter
Xchange1
XXX123xxx
xyjewati
yankees
yxcvbnm
zaq1zaq1
zinch
zwieback
zxcvbn
zxcvbnm\n
Empty file.