sitemap.xml 是一种帮助搜索引擎爬虫更好地理解你网站结构的文件。下面我将介绍几种在 Rust 中生成 sitemap.xml 的方法。
方法一:使用 sitemap
crate
Rust 有一个专门的 sitemap
crate 可以方便地生成 sitemap 文件:
1. 首先添加依赖到 Cargo.toml
:
[dependencies]
sitemap = "0.4"
2. 然后使用以下代码生成 sitemap:
use sitemap::reader::{SiteMapReader, SiteMapEntity};
use sitemap::writer::SiteMapWriter;
use sitemap::structs::{UrlEntry, SiteMapEntry};
use std::io::Cursor;
fn generate_sitemap() -> Vec<u8> {
let mut output = Cursor::new(Vec::new());
{
let sitemap_writer = SiteMapWriter::new(&mut output);
let mut urlwriter = sitemap_writer.start_urlset().expect("Unable to write urlset");
urlwriter.url("https://example.com/").expect("Unable to write url");
urlwriter.url("https://example.com/about").expect("Unable to write url");
urlwriter.url("https://example.com/contact").expect("Unable to write url");
urlwriter.end().expect("Unable to write close tag");
}
output.into_inner()
}
方法二:手动生成 XML
fn generate_simple_sitemap() -> String {
let urls = vec![
"https://example.com/",
"https://example.com/about",
"https://example.com/contact",
];
let mut sitemap = String::from(
r#"<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">"#,
);
for url in urls {
sitemap.push_str(&format!(
r#"
<url>
<loc>{}</loc>
<lastmod>{}</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>"#,
url,
chrono::Utc::now().format("%Y-%m-%d")
));
}
sitemap.push_str("\n</urlset>");
sitemap
}
方法三:集成到 web 框架
以下是使用 Actix-web 框架的完整示例:
1. 添加依赖:
[dependencies]
actix-web = "4"
sitemap = "0.4"
chrono = "0.4"
2. 实现 handler:
use actix_web::{get, App, HttpResponse, HttpServer, Responder};
use sitemap::writer::SiteMapWriter;
use sitemap::structs::UrlEntry;
use std::io::Cursor;
#[get("/sitemap.xml")]
async fn sitemap() -> impl Responder {
let mut buffer = Cursor::new(Vec::new());
{
let writer = SiteMapWriter::new(&mut buffer);
let mut url_writer = writer.start_urlset().unwrap();
let urls = vec![
("/", "1.0"),
("/about", "0.8"),
("/contact", "0.8"),
];
for (path, priority) in urls {
let full_url = format!("https://example.com{}", path);
let entry = UrlEntry::builder()
.loc(full_url)
.priority(priority)
.build()
.unwrap();
url_writer.url(entry).unwrap();
}
url_writer.end().unwrap();
}
HttpResponse::Ok()
.content_type("application/xml")
.body(buffer.into_inner())
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
HttpServer::new(|| {
App::new()
.service(sitemap)
})
.bind("127.0.0.1:8080")?
.run()
.await
}
高级功能
对于更复杂的网站,你可能需要:
从数据库动态获取 URL
根据页面更新频率设置 <changefreq>
设置 <priority>
处理大型网站的分片 sitemap
async fn get_dynamic_urls_from_db() -> Vec<UrlEntry> {
vec![
UrlEntry::builder()
.loc("https://example.com/post/1")
.lastmod(chrono::Utc::now().format("%Y-%m-%d").to_string())
.changefreq("weekly")
.priority("0.7")
.build()
.unwrap(),
]
}
#[get("/sitemap.xml")]
async fn dynamic_sitemap() -> impl Responder {
let urls = get_dynamic_urls_from_db().await;
let mut buffer = Cursor::new(Vec::new());
{
let writer = SiteMapWriter::new(&mut buffer);
let mut url_writer = writer.start_urlset().unwrap();
for url in urls {
url_writer.url(url).unwrap();
}
url_writer.end().unwrap();
}
HttpResponse::Ok()
.content_type("application/xml")
.body(buffer.into_inner())
}
最佳实践
将 sitemap.xml 放在网站根目录
在 robots.txt 中添加 sitemap 位置:Sitemap: https://example.com/sitemap.xml
对于大型网站,考虑使用 sitemap 索引文件
定期更新 sitemap,特别是当内容有变化时