0daysto.live

Semgrep Setup for Hackers

Howdy, in this guide I give steps that I use to setup and configure semgrep on Linux for when I need it to run source code scans.

Semgrep is a SAST scanning tool - this means it runs checks on source code statically (it doesn’t run the code - just reads it).

This is very useful in any situation where you have access to source code:

I don’t see it as a replacement for reading the source code but it can save you significant amount of time and even for experienced users it can flag types of issues that you might not yet be aware of - you can’t know everything!

The checks are based on rule files, in this guide we will be getting publicly provided rules but you can also write your own detections.

Installing Semgrep

First get semgrep itself installed.

Installing via package manager

On Arch I can install semgrep from the AUR with

yay -S semgrep-bin

Prefer to install it with your package manager if there is a semgrep package available to you can get updates more conveniently.

Installing via Pipx

If a package is not available you can install it via pipx instead.

apt install pipx python-is-python3
pipx install semgrep
pipx ensurepath

Installing semgrep-rules-manager

Semgrep Rules Manager is a third party script for managing your rules collection and keeping it up to date.

NOTE - The command below is my fork for now but will change to original repo once my pull request is accepted as it fixes some major bugs

pipx install git+https://github.com/meme-lord/semgrep-rules-manager.git

Downloading Rules

First make a folder where you want to store your semgrep rules - I just use a folder in the home directory:

mkdir ~/semgrep-rules # ~ is expanded to /home/your_username

To list rules sources:

semgrep-rules-manager --dir ~/semgrep-rules list

To download all the rules:

semgrep-rules-manager --dir ~/semgrep-rules download

Different rule sources may have different focuses and you may not have a need for some rulesets. After downloading the sources just delete the ones you don’t want/need and they wont be synced again:

rm -rf ~/semgrep-rules/gitlab # the Gitlab ones give errors for me idk, I think they follow a different syntax

Tangent: Update Script

On my machine I keep a script called “update” that I run to do system updates.

In that script you can add semgrep-rules-manager to update your rules whenever you do updates.

The script could be stored in ~/.local/bin/update and look a bit like this:

#!/bin/bash
pacman -Syu && yay # for debian this would be apt update && apt upgrade
# pipx upgrade semgrep # you would have this if you installed semgrep via pipx
semgrep-rules-manager --dir ~/semgrep-rules sync

Don’t forget to chmod +x ~/.local/bin/update and now you just need to run update occasionally to update your packages, semgrep, your rules etc. without have to remember each individual thing.

You should expand the update script to include updates for other tools you might use that aren’t available in the package manager - golang tools, python packages etc.

Running semgrep

I use a convenience script in ~/.local/bin/sg that looks like this:

#!/bin/bash
semgrep --metrics=off --config ~/semgrep-rules/ --no-rewrite-rule-ids \
--exclude-rule=python_crypto_rule-import-pycrypto \
--exclude-rule=raptor-bad-words \
--exclude-rule=package-dependencies-check \
--exclude-rule=detect-non-literal-regexp \
--exclude-rule=useless-assignment \
--exclude-rule=dockerfile-source-not-pinned \
--exclude-rule=prefer-copy-over-add \
--exclude-rule=use-distroless-base-image \
--exclude-rule=missing-user \
--exclude-rule=remove-package-lists \
--exclude-rule=remove-package-cache \
--exclude-rule=missing-no-install-recommends \
--exclude-rule=missing-pip-no-cache-dir \
--exclude-rule=missing-apk-no-cache \
--exclude-rule=no-new-privileges \
--exclude-rule=writable-filesystem-service \
--exclude-rule=prefer-apt-get \
--exclude-rule=set-pipefail \
--exclude-rule=stage-build \
--exclude-rule=multiple-entrypoint-instructions \
--exclude-rule=avoid-latest-version \
--exclude-rule=missing-image-version \
--exclude-rule=missing-assume-yes-switch \
--exclude-rule=missing-user-entrypoint \
--exclude-rule=prefer-json-notation \
--exclude-rule=robots-denied \
--exclude-rule=missing-integrity \
--exclude-rule=unspecified-open-encoding \
--exclude-rule=ruby-rails-performance-indexes-are-beneficial \
--exclude-rule=request-with-http \
--exclude-rule=request-session-with-http \
--exclude-rule=use-timeout \
--exclude-rule=python_requests_rule-request-without-timeout \
--exclude-rule=jsx-not-internationalized \
--exclude-rule=use-sys-exit \
--exclude-rule=port-all-interfaces \
--exclude-rule=curl-unencrypted-url \
--exclude-rule=use-tls \
--exclude-rule=lazy-load-module \
--exclude-rule=avoid-bind-to-all-interfaces \
--exclude-rule=go_network_rule-bind-to-all-interfaces \
--exclude-rule=err-nil-check \
--exclude-rule=avoid_app_run_with_bad_host \
--exclude-rule=use-raise-for-status \
--exclude-rule=avoid-apt-get-upgrade \
--exclude-rule=ocamllint-useless-else \
--exclude-rule=django-no-csrf-token \
--exclude-rule=os-error-is-permission \
--exclude-rule=deprecated-ioutil-readfile \
--exclude-rule=unencrypted-socket \
--exclude-rule=ssl-wrap-socket-is-deprecated \
--exclude-rule=disabled-cert-validation \
--exclude-rule=python_ssl_rule-req-no-certvalid \
--exclude-rule=no-replaceall \
--exclude-rule=go_http_rule-http-serve \
--exclude-rule=javascript-alert \
--exclude-rule=php_crypto_rule-weak-crypto \
--exclude-rule=ssh-disable-host-key-checking \
--exclude-rule=cookie-missing-secure \
--exclude-rule=cookie-missing-httponly \
--exclude-rule=c_obsolete_rule-usleep \
--exclude-rule=use-workdir \
--exclude-rule=express-check-csurf-middleware-usage \
--exclude-rule=var-in-href \
--exclude-rule=template-href-var \
--exclude-rule=missing-hsts-header \
--exclude-rule=sql-injection-php-potential-apps3c \
--exclude-rule=python_assert_rule-assert-used \
--exclude-rule=secure-set-cookie \
--exclude-rule=batch-import \
--exclude-rule=rules_lgpl_javascript_database_rule-sequelize-tls \
--exclude-rule=sequelize-enforce-tls \
--exclude-rule=react-insecure-request \
--exclude-rule=third-party-action-not-pinned-to-commit-sha \
--exclude=*.css --exclude=*.scss \
${@:1} \
. 

I’ve excluded a bunch of rules that I don’t find that useful from a pentest/bugbounty use-case - you can make your own changes these are just mine. At the end I’ve also exlcuded css and scss files from scanning as they are not likely to have anything interesting - you can expand that list too to improve scan times.

For a quick demo you can use DVWA (Damn Vulnerable Web App) as a scan target:

git clone https://github.com/digininja/DVWA.git
cd DVWA
sg

Interpreting the output

Findings from semgrep will not have full context of the application and still require analysis from you.

Take this finding for example:

   ❯❯❱ javascript-dynamic-execution-eval-Function
          Dynamic execution using eval and Function.

           83┆ var crypto = eval("require('crypto')");
            ⋮┆----------------------------------------
           84┆ var Buffer = eval("require('buffer').Buffer");

This rule has flagged the use of eval() which is often dangerous but in this case it’s a static string so you have no way to exploit it and can ignore it.

Another example:

    vulnerabilities/api/src/HealthController.php
   ❯❯❱ exec-use
          Executing non-constant commands. This can lead to command injection.

           88┆ exec ("ping -c 4 " . $target, $output, $ret_var);

In DVWA’s code this is RCE but in other cases it might not be exploitable - for example if $target was hardcoded or sanitized safely.

If you find a rule too noisy you can either improve the rule or exclude it in the sg script we made earlier.

Conclusion

Now you have things configured you can quickly run semgrep checks on codebases without too much frustration and hopefully land some findings!