What’s wrong with Jekyll’s related_posts?
Jekyll’s site.related_posts
by default just presents 10 recent posts. If you set the lsi
to true
in _config.yml
, the related_posts
can really work as it describes. However, enable lsi
will certainly slow down the building speed. Especially for posts that written in Chinese, seems like the latent semantic indexing (LSI) will never stop and the Jekyll build process will last for hours and hours…
So, I end up with a manually programmed related posts by targetting the posts with the same tag/category
using pure Liquid in Jekyll’s templates. Details can be found in my old post: Related posts in Jekyll.
What is LSI?
Why enable lsi
will lead to such slow build speed? We need to have a general feeling on what is the LSI - latent sematic indexing.
LSI, sometimes referred as latent semantic analysis, is a mathematical method developed in the late 1980s to improve the accuracy of information retrieval. It uses a technique called singular value decomposition to scan unstructured data within documents and identify relationships between the concepts contained therein.
In essence, it finds the latent relationships between words (semantics) in order to improve information understanding (indexing). It provided a significant step forward for the field of text comprehension as it accounted for the contextual nature of language.
So, it needs additional calculations among different posts for finding the related posts.
Speed up LSI
It’s great to have lsi
enabled for accurate related posts, and things become easier with rb-gsl
that speed up LSI immensely.
However, rb-gsl
requires gsl
(GNU Scientific Library) as the runtime dependency, you need to install gsl
locally on your build environment.
On macOS, that’s easy with Homebrew:
On Ubuntu/Debian:
sudo apt-get -y install libgsl-dev
Then, install these two gems or add them to your Gemfile then install them with bundler:
gem install classifier-reborn
gem install gsl
Now, you can safely enable lsi
to build related posts with super fast speed.
In case you’re testing your site that doesn’t care about the related posts, you can set lsi
to false
in _config.yml
and build related posts to only in production environment with bundle exec jekyll build --lsi
.
Note that GitHub Pages doesn’t support lsi
… But, Netlify has already added gsl
in their building image 👍.