Waybackurls is a command-line tool used for scraping URLs from the Wayback Machine.
Waybackurls is important for cybersecurity professionals because it allows them to uncover historical data about a website, identify potential vulnerabilities, and assess the security posture of a target.
https://github.com/tomnomnom/waybackurls
Basic Usage
- waybackurls <target>: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target.
- waybackurls <target> -json: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target in JSON format.
- waybackurls <target> | grep <keyword>: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target that contain the specified keyword.
- waybackurls <target> | sort -u: This command retrieves all the unique URLs of the Wayback Machine archive for the specified domain or target.
- waybackurls <target> | httprobe: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target and tests them for HTTP/HTTPS connectivity.
Advanced Usage
- waybackurls <target> | grep -Eo “(http|https)://[a-zA-Z0-9./?=_%:-]*”|sort -u: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target, and uses regex to extract only the URLs that begin with “http” or “https”.
- waybackurls <target> -exclude <exclude-file>: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target, but excludes the URLs listed in the specified file.
- waybackurls <target> -filter “status_code:200″|sort -u: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target that return a 200 status code.
- waybackurls <target> | unfurl paths | sort | uniq -c | sort -rn: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target, extracts only the paths, and sorts them by the number of occurrences to identify the most commonly accessed paths.
- waybackurls <target> | xargs -I{} curl -s -L -I -H “User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0” {} | grep -iE “x-frame-options|content-security-policy”: This command retrieves all the URLs of the Wayback Machine archive for the specified domain or target, and tests them for X-Frame-Options and Content-Security-Policy headers.
Integrations
- Using waybackurls with Nmap: By combining the results of Nmap with waybackurls, a cybersecurity professional can gather information about any web-based services that may be exposed on the target network.
- Using waybackurls with Gobuster: Gobuster is a tool used for directory and file brute-forcing on web servers. By using the URLs gathered by waybackurls, a cybersecurity professional can perform more targeted directory and file brute-forcing.
- Using waybackurls with Sublist3r: By combining the results of Sublist3r with waybackurls, a cybersecurity professional can gather information about the web-based services running on subdomains of the target domain.
- Using waybackurls with Burp Suite: By feeding the URLs gathered by waybackurls into Burp Suiteās spidering feature, a cybersecurity professional can identify additional web application endpoints that may be vulnerable to attack.
Installation
1. Having GoLang already installed in your system run
- go install github.com/tomnomnom/waybackurls@latest
2. Display the help menu
- waybackurls -h
How to use
1. Query a domain
- echo “https://vk9-sec.com” | waybackurls
2. Run it against a file that contains a list of URLs
- cat file.txt | waybackurls