It seems like almost every other click on the internet ends up in an "Error 404: Page Not Found" page. "Whoops, the page you're looking for does not exist," "Sorry, the requested URL was not found on this server," "Oops, something went wrong. Page not found." Every internet user has seen pages like these.
I think it's important that web developers consider paying less attention to building clever 404 pages, and start eliminating broken links altogether.
I've built an automated program to find broken links.
Written in Python 3, it recursively follows links on any given site and checks each one for 404 errors. When the program has finished searching an entire site, it prints out any found broken links and where those links are so that developers can fix them.
Note that the program does make a lot of HTTP requests in a relatively short period of time, so be aware of Internet usage rates and the like.
- Check if you have Python 3 installed:
If the following command does not yield a version number, download Python 3 from python.org.
$ python3 -V
- Download the Requests and BeautifulSoup package (for HTML parsing) with PyPi.
(Note: I do not maintain these packages and am not associated with them, so download at your own risk)
$ pip3 install requests $ pip3 install beautifulsoup4
- Copy paste the following code into a file (I use the name
find_broken_links.pyin this article).
- Run on command line with a website of your choice.
$ python3 find_broken_links.py https://your_site.com/
I hope you found this useful, and it certainly helped me find a few broken links on my own site.
This program is CC0 Licensed, so it is completely free to use, but makes no warranties or guarantees.
If you liked this post, share it with your friends and colleagues!
Thanks for scrolling.
— Gabriel Romualdo, November 10, 2019