The Rundown

The original site sacred-texts.com is built entirely using plain HTML – no stylesheets. The books are broken up into pages that require a link click to get to the next page of the book. Some of the pages are extremely short (just a sentence or two), requiring hundreds of clicks to read a small portion of a given book. Here is an example of the reading experience:

Book of Enoch Screenshot

An example of a book with very short chapters and the user experience it provides

I purchased the USB archive of the entire collection hoping this would give me plain text documents of all the text, but instead I received all the individual html files – essential a carbon copy of the site. With such a poor and annoying reading experience, I decided to build my own reading solution for these texts to enjoy for years to come.

The Plan

  1. Design the Backend – build a backend that accounts for the wide variety of content and formatting in the book archive.
  2. Scrape the data – Create a scraping script using BeautifulSoup to collect and store the texts in the database. This script will need to account for the variability and lack of consistency in the markup of the HTML files.
  3. Build the App – Once the data is stored in the database, it can be easily accessed and displayed by the frontend application, providing a user-friendly reading experience for the catalog of texts.

The Solution

Step 1: Design the Backend

Build an app that makes it easy to read any text from the library. This is how the backend will be setup. The database for the reader app is simple but accounts for the variance in all the books.

Archive Reader Backend Database ERD

An ERD diagram for the backend database architecture of the Archive Reader app

Step 2: Scrape the Data

With all the books/texts are individually coded HTML pages without classnames nor a unified structure for elements like headers, table of contents and navigation. This made the content uniquely challenging to scrape and organize into a database. With more than 1700 titles and 100k chapters (pages), this stage required several iterations to the scraping logic given the variability and lack of consistency in the markup.

Table of Contents Example Screenshot

An example of the HTML structure from a given text

Step 3: Building the App

Once the data is stored in the database, it can be easily accessed and displayed by the frontend application, providing a user-friendly reading experience for the sacred texts.

There app consists of 3 primary pages here:

  1. Index page – the homepage essentially where we display a portion of the catalog and allow a user to search and/or browse through the archive of texts.
  2. Category page – Another more refined list of texts within a certain category, topic, or alphabetical letter.
  3. Reader page – the final page is the actual reader view – the main reason for the project. Here we create a clean, mobile-friendly, easy-to-use presentation of the text with dark-mode as the default for reading.

Archive Reader App Screenshots

Overview

Developing a mobile-friendly reading solution for an archive of sacred texts, addressing HTML formatting issues, and establishing a structured database for an enhanced user experience.

Contents
Year

2023

Similar Projects
  • Xerofit Brand Development: A Case Study
    Building a Unique Brand Identity from the Ground Up
  • Greater Than Consulting Logo Design
    Using Negative Space to Craft the Minimalist GTC Logo