昨天搭建完文档网站,用的是内置搜索插件,但占用较高,睡觉前申请了Algolia Crawler,自认为申请条件不是完全符合,早上起来发现通过了?
直接配置爬虫
new Crawler({
appId: "YOUR_APP_ID",
apiKey: "YOUR_API_KEY",
rateLimit: 8,
startUrls: [
// 这是 Algolia 开始抓取网站的初始地址
// 如果你的网站被分为数个独立部分,你可能需要在此设置多个入口链接
"https://YOUR_WEBSITE_URL/",
],
sitemaps: [
// 如果你在使用 Sitemap 插件 (如: vuepress-plugin-sitemap2),你可以提供 Sitemap 链接
"https://YOUR_WEBSITE_URL/sitemap.xml",
],
ignoreCanonicalTo: false,
exclusionPatterns: [
// 你可以通过它阻止 Algolia 抓取某些 URL
],
discoveryPatterns: [
// 这是 Algolia 抓取 URL 的范围
"https://YOUR_WEBSITE_URL/**",
],
// 爬虫执行的计划时间,可根据文档更新频率设置
schedule: "at 02:00 every 1 day",
actions: [
// 你可以拥有多个 action,特别是你在一个域名下部署多个文档时
{
// 使用适当的名称为索引命名
indexName: "YOUR_INDEX_NAME",
// 索引生效的路径
pathsToMatch: ["https://YOUR_WEBSITE_URL/**"],
// 控制 Algolia 如何抓取你的站点
recordExtractor: ({ $, helpers }) => {
// 以下是适用于 vuepress-theme-hope 的默认选项选项
// vuepress-theme-hope 默认的容器类名为 theme-hope-content
return helpers.docsearch({
recordProps: {
lvl0: {
selectors: ".sidebar-heading.active",
defaultValue: "Documentation",
},
lvl1: ".theme-hope-content h1",
lvl2: ".theme-hope-content h2",
lvl3: ".theme-hope-content h3",
lvl4: ".theme-hope-content h4",
lvl5: ".theme-hope-content h5",
lvl6: ".theme-hope-content h6",
content: ".theme-hope-content p, .theme-hope-content li",
},
indexHeadings: true,
});
},
},
],
initialIndexSettings: {
// 控制索引如何被初始化,这仅当索引尚未生成时有效
// 你可能需要在修改后手动删除并重新生成新的索引
YOUR_INDEX_NAME: {
attributesForFaceting: ["type", "lang"],
attributesToRetrieve: ["hierarchy", "content", "anchor", "url"],
attributesToHighlight: ["hierarchy", "hierarchy_camel", "content"],
attributesToSnippet: ["content:10"],
camelCaseAttributes: ["hierarchy", "hierarchy_radio", "content"],
searchableAttributes: [
"unordered(hierarchy_radio_camel.lvl0)",
"unordered(hierarchy_radio.lvl0)",
"unordered(hierarchy_radio_camel.lvl1)",
"unordered(hierarchy_radio.lvl1)",
"unordered(hierarchy_radio_camel.lvl2)",
"unordered(hierarchy_radio.lvl2)",
"unordered(hierarchy_radio_camel.lvl3)",
"unordered(hierarchy_radio.lvl3)",
"unordered(hierarchy_radio_camel.lvl4)",
"unordered(hierarchy_radio.lvl4)",
"unordered(hierarchy_radio_camel.lvl5)",
"unordered(hierarchy_radio.lvl5)",
"unordered(hierarchy_radio_camel.lvl6)",
"unordered(hierarchy_radio.lvl6)",
"unordered(hierarchy_camel.lvl0)",
"unordered(hierarchy.lvl0)",
"unordered(hierarchy_camel.lvl1)",
"unordered(hierarchy.lvl1)",
"unordered(hierarchy_camel.lvl2)",
"unordered(hierarchy.lvl2)",
"unordered(hierarchy_camel.lvl3)",
"unordered(hierarchy.lvl3)",
"unordered(hierarchy_camel.lvl4)",
"unordered(hierarchy.lvl4)",
"unordered(hierarchy_camel.lvl5)",
"unordered(hierarchy.lvl5)",
"unordered(hierarchy_camel.lvl6)",
"unordered(hierarchy.lvl6)",
"content",
],
distinct: true,
attributeForDistinct: "url",
customRanking: [
"desc(weight.pageRank)",
"desc(weight.level)",
"asc(weight.position)",
],
ranking: [
"words",
"filters",
"typo",
"attribute",
"proximity",
"exact",
"custom",
],
highlightPreTag:
'<span class="algolia-docsearch-suggestion--highlight">',
highlightPostTag: "</span>",
minWordSizefor1Typo: 3,
minWordSizefor2Typos: 7,
allowTyposOnNumericTokens: false,
minProximity: 1,
ignorePlurals: true,
advancedSyntax: true,
attributeCriteriaComputedByMinProximity: true,
removeWordsIfNoResults: "allOptional",
},
},
});
再安装@vuepress/plugin-docsearch
pnpm add -D @vuepress/plugin-docsearch@next
yarn add -D @vuepress/plugin-docsearch@next
npm i -D @vuepress/plugin-docsearch@next
配置
// .vuepress/config.ts
import { docsearchPlugin } from "@vuepress/plugin-docsearch";
import { defineUserConfig } from "vuepress";
export default defineUserConfig({
plugins: [
docsearchPlugin({
// 你的选项
// appId, apiKey 和 indexName 是必填的,其中apikey区别于爬虫apikey,是发到邮件里的那个
}),
],
});
汉化配置
// .vuepress/config.ts
import { defineUserConfig } from "vuepress";
import { docsearchPlugin } from "@vuepress/plugin-docsearch";
export default defineUserConfig({
plugins: [
docsearchPlugin({
// ...
locales: {
"/zh/": {
placeholder: "搜索文档",
translations: {
button: {
buttonText: "搜索文档",
buttonAriaLabel: "搜索文档",
},
modal: {
searchBox: {
resetButtonTitle: "清除查询条件",
resetButtonAriaLabel: "清除查询条件",
cancelButtonText: "取消",
cancelButtonAriaLabel: "取消",
},
startScreen: {
recentSearchesTitle: "搜索历史",
noRecentSearchesText: "没有搜索历史",
saveRecentSearchButtonTitle: "保存至搜索历史",
removeRecentSearchButtonTitle: "从搜索历史中移除",
favoriteSearchesTitle: "收藏",
removeFavoriteSearchButtonTitle: "从收藏中移除",
},
errorScreen: {
titleText: "无法获取结果",
helpText: "你可能需要检查你的网络连接",
},
footer: {
selectText: "选择",
navigateText: "切换",
closeText: "关闭",
searchByText: "搜索提供者",
},
noResultsScreen: {
noResultsText: "无法找到相关结果",
suggestedQueryText: "你可以尝试查询",
reportMissingResultsText: "你认为该查询应该有结果?",
reportMissingResultsLinkText: "点击反馈",
},
},
},
},
},
}),
],
});
配置Algolia搜索插件本身没什么难度,记录踩过的两个坑:
1. 配置文件(.vuepress/config.ts)中的apikey区别于爬虫配置apikey,是发到邮件里的那个。填错会返回403错误。
2. 爬取页面有问题都可以看日志或提示解决。我的cdn拦截了Algolia爬虫,可以给爬虫ip加白 34.66.202.43
参考链接https://www.algolia.com/doc/tools/crawler/troubleshooting/faq/
–
ps. 至此我的文档网站搭建也彻底结束了?只是明天去云南玩一周,来不及好好休息两天了…
© 版权声明
文章版权归作者所有,未经允许请勿转载。
THE END
暂无评论内容