Process Management and AutomationLesson 5.5
Bash parallel processing with xargs and GNU parallel
xargs -P for parallelism, GNU parallel basics, parallel vs xargs, controlling job count, parallel with progress, collecting output, handling failures in parallel, rate limiting
Parallelism at Scale
Background jobs with & work for a fixed set of tasks. For dynamic lists of items, xargs -P or GNU parallel are more powerful.
# xargs -P: run up to N jobs in parallel
find /data -name "*.gz" | xargs -P 8 -I{} gunzip {}
# Process 4 files at a time
printf '%s\n' *.csv | xargs -P 4 -I{} bash -c '
echo "Processing: {}"
process_file "{}" > "processed/{}.out"
'GNU parallel
# Install: apt install parallel OR brew install parallel
# Process all CSV files with 8 workers
parallel -j 8 process_file ::: *.csv
# With progress bar
parallel --progress -j 4 compress_image ::: images/*.png
# Pass multiple arguments
parallel -j 4 deploy_to_region ::: us-east eu-west ap-south
# From a file of inputs
parallel -j 8 -a servers.txt ping -c 1 {}Collecting Results
# xargs writes to stdout — capture with tee or redirect
find . -name '*.log' | xargs -P 4 -I{} grep -l 'ERROR' {} \
| sort > error_files.txt
# GNU parallel can tag output by job
parallel --tag -j 4 wc -l ::: *.txt
# Output prefixed with the argument: "file.txt\t42"GNU parallel handles edge cases better than manual & loops: it manages job slots automatically, retries failed jobs (--retry-failed), and keeps output organized. For more than 10 parallel tasks, prefer it over handwritten job control.
