Rust for a Pythonista #2: Building a Rust crate for CSS inlining
It is the second part of a series about Rust for Python users.
In this article, we will build a foundation for a Rust-powered Python library - a crate that implements CSS inlining.
It is a process of moving CSS rules from style
tags to the corresponding spots in the HTML body.
This approach to including styles is crucial for sending HTML emails or embedding HTML pages into 3rd party resources.
Our goal is to build a library that will transform this HTML:
<html>
<head>
<style>h1 { color:blue; }</style>
</head>
<body>
<h1>Big Text</h1>
</body>
</html>
into this:
<html>
<head>
<style>h1 { color:blue; }</style>
</head>
<body>
<h1 style="color:blue;">Big Text</h1>
</body>
</html>
We'll go through:
- How CSS inlining works
- Popular crates for HTML & CSS processing
- Configuration: The Builder pattern
- Searching in an HTML document
- Building a CSS parser for qualified rules
- Modifying nodes: Interior Mutability pattern
- Serializing the output into a generic writer
- Next steps & potential improvements
Target audience: Those who know Rust common principles and looking for practical examples. Some familiarity with trait bounds and generics is helpful.
ANNOUNCE: I build a service for API fuzzing. Sign up to check your API now!
Other chapters:
How CSS inlining works?
The inlining process involves many details and corner cases - merging CSS with existing style
attributes' values, loading external stylesheets, handling pseudo-selectors, and many more.
This implementation will include a small feature set: moving CSS rules from style
tags to appropriate style
attributes and optional removing of style
tags after inlining.
And the last assumption is that this transformation is fallible, because in some cases, as a malformed CSS selector, it is not clear how to query the DOM and find matching elements.
The most natural flow to uphold these requirements might look like this:
- Find all
style
tags; - For each CSS rule in tags find elements matching its selector;
- Insert declarations to the matched element
style
attribute;
These operations require an ability to navigate through an HTML document and manipulate its nodes. The most popular project that provides many high-quality components to work with HTML and CSS is Servo - a modern browser engine created by Mozilla.
Learn more about the Servo project from this YouTube video by Josh Matthews
The particular crates we are interested in are the following:
These tools give the developer a browser-grade performance and much flexibility in the parsing process. On the other hand, they are relatively low-level - for example, html5ever
does not provide any DOM tree representation.
Luckily, there is kuchiki that conveniently wraps them into one powerful library.
Start a new project
I assume, that you already have Rust & cargo
installed, if not, then follow the instructions from rustup.rs.
I used rustc
1.45.2 for compiling all the Rust code in this article, but earlier compiler versions should work too.
Let's start by creating a new Rust project:
$ cargo new --lib css-inline-example
Created library `css-inline-example` package
$ cd css-inline-example && tree
.
├── Cargo.toml
└── src
└── lib.rs
1 directory, 2 files
And adding the dependencies mentioned above to the Cargo.toml
file:
[dependencies]
cssparser = "0.27.2"
kuchiki = "0.8.1"
To reflect the task requirements, we can write a stub function that will take HTML as a string slice and return its inlined version:
pub fn inline(html: &str) -> Result<String, InlineError> {
todo!() // panics with a "not yet implemented" message
}
#[derive(Debug)]
pub enum InlineError {}
The
Debug
trait makes a type printable with the '{:?}' format specifier
Since inlining is fallible, this function returns the Result
type. Its Err
variant includes an enum with potential error cases.
We will expand this enum when we encounter different error scenarios.
To verify that the future implementation works as intended, we can include a test based on the original example at the beginning of the article:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn it_works() {
let html = r#"<html><head>
<style>h1 { color:blue; }</style>
</head>
<body><h1>Big Text</h1></body>
</html>"#;
let expected = r#"<html><head>
<style>h1 { color:blue; }</style>
</head>
<body><h1 style=" color:blue; ">Big Text</h1>
</body></html>"#;
let inlined = inline(html).unwrap();
assert_eq!(inlined, expected)
}
}
kuchiki
slightly alters the original formatting of the input document. I formatted the samples in the test to match the kuchiki
approach, which simplifies testing.
Inlining configuration
The inlining process should be configurable if you decide to implement more optional features - optionally remove processed style
tags, load remote stylesheets, and any other.
The Builder pattern is one of the most convenient ways to design the configuration process. It enables creating very expressive and ergonomic APIs, especially if you have many optional configuration parameters. You can specify only the options you need, and the others will use their default values.
let inliner = CSSInliner::options()
.remove_style_tags(true)
// some more options?
.build();
let processed = inliner::inline(&html);
The underlying process includes the following ingredients:
- Mutable storage for configuration options;
- Setters to modify the defaults;
- Creating something that will perform inlining and use the desired configuration in it.
It implies having two structs - one for options and one for "inliner". The latter will accept options in its constructor:
#[derive(Debug)]
pub struct InlineOptions {
pub remove_style_tags: bool,
}
#[derive(Debug)]
pub struct CSSInliner {
options: InlineOptions,
}
impl CSSInliner {
pub fn new(options: InlineOptions) -> Self {
CSSInliner { options }
}
}
To provide a set of default options, we need to implement the Default
trait for the InlineOptions
:
impl Default for InlineOptions {
fn default() -> Self {
InlineOptions {
remove_style_tags: false, // do not remove style tags by default
}
}
}
Then the default "inliner" should use the default options and create them via the options
method:
impl Default for CSSInliner {
fn default() -> Self {
CSSInliner::new(InlineOptions::default())
}
}
impl CSSInliner {
pub fn options() -> InlineOptions {
InlineOptions::default()
}
}
To make our API design work as expected InlineOptions
should contain setters for configuration options and a build
method to create a new CSSInliner
:
impl InlineOptions {
pub fn remove_style_tags(mut self, remove_style_tags: bool) -> Self {
self.remove_style_tags = remove_style_tags;
self
}
pub fn build(self) -> CSSInliner {
CSSInliner::new(self)
}
}
To get more information regarding the Builder pattern, see this guide or look at the
url
crate source code
Now we can place the inlining logic inside the CSSInliner
struct, and the original inline
function will use it under the hood:
impl CSSInliner {
pub fn inline(&self, html: &str) -> Result<String, InlineError> {
todo!()
}
}
pub fn inline(html: &str) -> Result<String, InlineError> {
CSSInliner::default().inline(html)
}
Searching in an HTML document
Looking up all style
tags requires parsing the input HTML document with kuchiki
:
use kuchiki::{parse_html, traits::TendrilSink};
impl CSSInliner {
pub fn inline(&self, html: &str) -> Result<String, InlineError> {
let document = parse_html().one(html);
for style_tag in document
.select("style")
.map_err(|_| InlineError::ParseError("Unknown error".to_string()))?
{
// ...
}
todo!()
}
}
#[derive(Debug)]
pub enum InlineError {
ParseError(String),
}
In
one
, the input HTML is transformed intoTendril
- a compact string type that behaves similarly toString
but optimized for zero-copy parsing
The current Rust (1.45.2
at the time of writing) requires us to explicitly import the TendrilSink
trait to use the one
method.
In general, there could be two traits that implement the one
method; therefore, the compiler won't know which implementation to use.
However, there is an RFC to mitigate this restriction.
The select
call may fail on unsupported selectors or syntax errors, but kuchiki
doesn't return
a meaningful error type for some reason and returns the unit type instead, so we have to map it with our error type.
An alternative would be the From
trait, but it will define the () -> InlineError
conversion for all cases, not only for this specific one.
The problem is if, in some other place, the Err
variant will contain ()
, then the From
trait implementation will convert it to InlineError::ParseError
, which may be wrong in that context.
It is always better to return meaningful error types to avoid redundancy and ambiguity with error propagation.
If you want to learn more about error handling in Rust, read this masterpiece by BurntSushi
After finding all the style
tags, we need to extract their text content:
use kuchiki::{parse_html, traits::TendrilSink, NodeRef};
impl CSSInliner {
pub fn inline(&self, html: &str) -> Result<String, InlineError> {
// ...
{
if let Some(first_child) = style_tag.as_node().first_child() {
if let Some(css_cell) = first_child.as_text() {
process_css(&document, css_cell.borrow().as_str())?;
}
}
if self.options.remove_style_tags {
style_tag.as_node().detach()
}
}
todo!()
}
}
fn process_css(document: &NodeRef, css: &str) -> Result<(), InlineError> {
todo!()
}
A few notes on the kuchiki
implementation details:
- The text inside a
style
tag is a separate node of a tree; thus there is afirst_child
call; css_cell
is aRefCell
that is common for trees representation in Rust.
As we want to remove processed style
tags optionally, this is the optimal place to remove them.
Removing a node works by dropping all references to this node from its parents and siblings.
The process_css
function will parse the provided text and insert CSS rules into the matched elements, modifying it in-place.
CSS parsing
The CSS parsing implementation is quite low-level because we don't have high-level wrappers like kuchiki
.
The main benefit of cssparser
crate is its flexibility that allows the developer what to parse and how - you can parse rules differently depending
on the context or entirely skip parsing of rules, that you don't need.
By using it, we will be able to parse rules list like this:
h1, h2 { color:blue; }
strong { text-decoration:none }
p { font-size:2px }
p.footer { font-size: 1px}
To implement this, we need to use cssparser::RuleListParser::new_for_stylesheet
that accepts CSS rules list and a parser.
The following traits should bound the parser
argument:
QualifiedRuleParser
AtRuleParser
The first trait parses qualified rules that consist of two parts - a prelude and a block. In most cases, the prelude is a CSS selector, and the block is a list of declarations enclosed in curly brackets:
.button {
padding: 3px;
border-radius: 5px;
border: 1px solid black;
}
Read more about CSS parsing in the W3C recommendation
The QualifiedRuleParser
trait requires three associated types:
Prelude
. CSS selector;QualifiedRule
. CSS selector + block;Error
. Additional data for custom errors.
To use CSS selectors for querying the document, we need to keep a prelude and a block separately inside the QualifiedRule
type:
type QualifiedRule<'i> = (&'i str, &'i str);
We need no custom errors; hence, we can set the Error
type to ()
. The trait itself can work for an empty struct:
use cssparser::QualifiedRuleParser;
struct CSSRuleListParser;
impl<'i> QualifiedRuleParser<'i> for CSSRuleListParser {
type Prelude = &'i str;
type QualifiedRule = QualifiedRule<'i>;
type Error = ();
}
This trait requires the lifetime of the input data, and we can use the same lifetime in our types, which means that parsed qualified rules will live as long as the input.
The default trait implementation ignores all qualified rules; therefore, we have to redefine this behavior.
Parsing happens in two methods, and the first one is parse_prelude
:
use cssparser::{ParseError, Parser, QualifiedRuleParser};
impl<'i> QualifiedRuleParser<'i> for CSSRuleListParser {
// ... associated types
fn parse_prelude<'t>(
&mut self,
input: &mut Parser<'i, 't>,
) -> Result<Self::Prelude, ParseError<'i, Self::Error>> {
todo!()
}
}
It accepts the Parser
type that has two layers and behaves similarly to an iterator:
- First, the underlying
ParserInput
andTokenizer
structs perform lexical analysis of the input and yield tokens likeDelimiter
orNumber
; - And then
Parser
processes tokens fromParserInput
and checks if these tokens form meaningful CSS constructions;
You can learn more about tokens in the source code
For our needs it will be enough to advance the parser until the end of the prelude and return it as a string slice:
fn exhaust<'i>(input: &mut Parser<'i, '_>) -> &'i str {
let start = input.position(); // the current parsing position
while input.next().is_ok() {} // parse while it is possible
input.slice_from(start) // take a slice from the parsed block
}
impl<'i> QualifiedRuleParser<'i> for CSSRuleListParser {
// ...
fn parse_prelude<'t>(
&mut self,
input: &mut Parser<'i, 't>,
) -> Result<Self::Prelude, ParseError<'i, Self::Error>> {
Ok(exhaust(input))
}
}
But how does the parser know when the prelude ends?
Before cssparser
calls the parse_prelude
function, it configures the parser to yield tokens only until a curly bracket occurs.
For this reason, it is safe to call input.next()
until the first Err
- it won't go any further than the first {
position, which bounds this parsing step only to a prelude.
See the source code of
Parser.parse_until_before
for more information
For the second method, it works similarly - the parser will stop at the closing curly bracket symbol, and we can parse the rest of the qualified rule with the same exhaust
function:
use cssparser::{ParseError, Parser, QualifiedRuleParser, SourceLocation};
impl<'i> QualifiedRuleParser<'i> for CSSRuleListParser {
// ...
fn parse_block<'t>(
&mut self,
prelude: Self::Prelude,
_: SourceLocation,
input: &mut Parser<'i, 't>,
) -> Result<Self::QualifiedRule, ParseError<'i, Self::Error>> {
Ok((prelude, exhaust(input)))
}
}
We finished this trait implementation. Let's deal with the second one!
AtRuleParser
trait's default implementation ignores all at-rules, which is what we need exactly, the spec restricts the content of style
attributes:
The value of the style attribute must match the syntax of the contents of a CSS declaration block (excluding the delimiting braces)
We can't extract the declaration block content because all important at-rules are conditionals, and by removing it, we'll lose information when they should be applied.
@media screen and (max-width: 992px) {
body {
// Only the content of this block is allowed by the spec
background-color: blue;
}
}
Consequently, this implementation will require defining only associated types:
use cssparser::AtRuleParser;
impl<'i> AtRuleParser<'i> for CSSRuleListParser {
type PreludeNoBlock = &'i str;
type PreludeBlock = &'i str;
type AtRule = QualifiedRule<'i>;
type Error = ();
}
The only important detail here is that the RuleListParser
struct adds additional restrictions on AtRule
and Error
types. Its source code:
impl<'i, 't, 'a, R, P, E: 'i> RuleListParser<'i, 't, 'a, P>
where
P: QualifiedRuleParser<'i, QualifiedRule = R, Error = E>
+ AtRuleParser<'i, AtRule = R, Error = E>,
{
pub fn new_for_stylesheet(input: &'a mut Parser<'i, 't>, parser: P) -> Self {
// ...
}
}
Which reads: the parser
argument has a generic type P
. This type P
should implement traits QualifiedRuleParser
and AtRuleParser
where QualifiedRuleParser::QualifiedRule
is the same as AtRuleParser::AtRule
and QualifiedRuleParser::Error
is the same as AtRuleParser::Error
.
In our implementation it means that the AtRule
associated type should be QualifiedRule<'i>
and the Error
type should be ()
(the same as QualifiedRule
and Error
in QualifiedRuleParser
respectively).
The ability to require the same types across different trait bounds allows developers to express more in their APIs.
It would be nice to have some default values for those types to avoid writing them by hand! The RFC for associated types defaults was accepted, and the implementation is in-progress (here is the tracking issue).
Modifying HTML elements
Now, finally, we can use our parser! There are no shortcuts for constructing a RuleListParser
instance from a string slice; therefore, we need to build all pieces by hand:
use cssparser::{Parser, ParserInput, RuleListParser};
fn process_css(document: &NodeRef, css: &str) -> Result<(), InlineError> {
let mut parse_input = ParserInput::new(css);
let mut parser = Parser::new(&mut parse_input);
let rules = RuleListParser::new_for_stylesheet(
&mut parser,
CSSRuleListParser
);
for rule in rules {
// apply this rule!
}
Ok(())
}
The next step is iterating over parsed rules and processing them individually. The parsing result is an iterator over Result
instances which can be:
Ok
. A tuple of two string slices - a selector and a block;Err
. Also a tuple. It contains an instance ofcssparser::ParseError
and the erroneous input.
As you may see, the return type of the process_css
function has InlineError
in its Err
variant. Hence we need to convert cssparser::ParseError
into our InlineError
to propagate errors.
The canonical way is to use the From
trait:
use cssparser::{BasicParseErrorKind, ParseError, ParseErrorKind};
impl From<(ParseError<'_, ()>, &str)> for InlineError {
fn from(error: (ParseError<'_, ()>, &str)) -> Self {
let message = match error.0.kind {
ParseErrorKind::Basic(kind) => match kind {
BasicParseErrorKind::UnexpectedToken(token) => {
format!("Unexpected token: {:?}", token)
}
BasicParseErrorKind::EndOfInput => "End of input".to_string(),
BasicParseErrorKind::AtRuleInvalid(value) => {
format!("Invalid @ rule: {}", value)
}
BasicParseErrorKind::AtRuleBodyInvalid => {
"Invalid @ rule body".to_string()
}
BasicParseErrorKind::QualifiedRuleInvalid => {
"Invalid qualified rule".to_string()
}
},
ParseErrorKind::Custom(_) => "Never happens".to_string(),
};
InlineError::ParseError(message)
}
}
By matching all the error kinds, we can provide clear error messages for our library.
Now we can handle the rules and compile CSS selectors for further matching against them:
use kuchiki::Selectors;
fn process_css(document: &NodeRef, css: &str) -> Result<(), InlineError> {
// ...
for rule in rules {
let (selector, block) = rule?;
if let Ok(matching_elements) = document.select(selector) {
for el in matching_elements {
todo!()
}
}
}
Ok(())
}
The code above is similar to what we used before to find all style
tags, but in this case, it is better to skip unsupported selectors for better future compatibility.
We need to modify each matched element and put the block
value into the style
attribute.
Some nodes may already have non-empty style
attributes, but the implementation will require using additional traits
from cssparser
and specific merging rules. To focus on the most straightforward flow, I leave it as an exercise to the reader.
As you may see, all variables are immutable during iterating over parsed rules, but we still need to modify element.attributes
.
It is possible because element.attributes
is a RefCell
that implements the Interior Mutability pattern.
This Rust pattern allows you to modify some object's internal state by checking borrowed rules in runtime.
Read more about the Interior Mutability pattern in chapter 15 of the Book.
When we want to borrow the value of a RefCell
mutably, there is a choice - use borrow_mut
that panics if the value is currently borrowed or try_borrow_mut
that returns Result
.
At the moment, it is the only place where we access attributes, therefore using borrow_mut
is safe, but this condition may change, and this code will panic.
It is a possible situation if, for example, we'll decide to implement the handling of external stylesheets and via href
attributes of "link" tags.
Even if the probability is quite low, I prefer a bit more safe and explicit (but more verbose) code:
fn process_css(document: &NodeRef, css: &str) -> Result<(), InlineError> {
// ...
for el in matching_elements {
if let Ok(mut attributes) = el.attributes.try_borrow_mut() {
attributes.insert("style", block.to_string());
}
}
// ...
}
The attributes
internal (since it is a RefCell
) value is a wrapper around a BTreeMap
and provides a similar interface.
Our simple inlining is done, and now its time to serialize the output.
Generic writers
To serialize an HTML document, we need to write a textual representation of all its nodes into some sink.
kuchiki
supports serialization to any target that implements std::io::Write
trait (a file, for example).
The simplest case is serialization into a vector of bytes, which then we need to convert to a string:
impl CSSInliner {
pub fn inline(&self, html: &str) -> Result<String, InlineError> {
// ...
let mut output = Vec::new();
document.serialize(&mut output)?;
Ok(String::from_utf8_lossy(&output).to_string())
}
}
document.serialize
returns std::io::Error
in its Err
case, therefore we have to implement another From
trait and
add a new variant to the InlineError
enum:
use std::io;
#[derive(Debug)]
pub enum InlineError {
ParseError(String),
IO(io::Error),
}
impl From<io::Error> for InlineError {
fn from(error: io::Error) -> Self {
InlineError::IO(error)
}
}
But what if you'd like to write inlined HTML to a file or some network stream? The current approach is not flexible enough. Let's create a new method that will provide more flexibility:
impl CSSInliner {
pub fn inline_to<W: io::Write>(&self, html: &str, target: &mut W) -> Result<(), InlineError> {
// ... inlining implementation
document.serialize(target)?;
Ok(()
}
}
And use it in the original one:
impl CSSInliner {
pub fn inline(&self, html: &str) -> Result<String, InlineError> {
let mut output = Vec::new();
self.inline_to(html, &mut output)?;
Ok(String::from_utf8_lossy(&output).to_string())
}
}
Now it is possible to serialize inlined HTML to any target that implements the io::Write
trait.
Finally, our code compiles, and we can run the test we wrote in the beginning:
$ cargo t
Finished test [unoptimized + debuginfo] target(s) in 1.18s
Running target/debug/deps/css_inline_example-80ecd8c1feae1ffc
running 1 test
test tests::it_works ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
cargo t
is an alias forcargo test
, and cargo supports more of them. Also, you can define your own. Check the Cargo documentation
Ok, inlining works! Let's add a couple of improvements.
Further improvements
The Error
trait improves debugging by providing access to the original cause, and the Display
trait makes errors more descriptive and allows them to be formatted with the default formatter.
use std::error::Error;
use std::fmt;
impl Error for InlineError {
fn source(&self) -> Option<&(dyn Error + 'static)> {
match self {
InlineError::IO(error) => Some(error),
InlineError::ParseError(_) => None,
}
}
}
impl fmt::Display for InlineError {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
InlineError::IO(error) => f.write_str(error.to_string().as_str()),
InlineError::ParseError(error) => f.write_str(error.as_str()),
}
}
}
clippy
is another useful thing that can improve your code. If you don't have it yet, then you can install it with rustup
:
$ rustup component add clippy
I like the "pedantic" set of lints because it provides many helpful (and sometimes annoying) suggestions that may improve your code:
// lib.rs
#![warn(clippy::pedantic)]
clippy
also runs cargo check
under the hood, so you don't have to run both of them.
Documentation is one of the most important aspects of any great crate. To always keep it in mind, add this line to the beginning of your lib.rs
file:
#![warn(missing_docs)]
And clippy
will remind you if you missed documenting any public entity of your crate.
Check how the documentation will look like by running cargo doc
and opening the target/doc/css_inline_example/index.html
file in your browser.
See the complete CSS inlining implementation in this GitHub repo
Summary
At this point, inlining works. We implemented:
- a high-level
inline
function and a configurable struct for more flexible inlining; - selecting elements in HTML and modifying them;
- parser of CSS rules;
- error handling;
- serializing inlined HTML to a generic target;
Our Rust crate is completed, now we can start adding Python bindings to it!
Chapters:
- Rust for a Pythonista #1: Why and when?
- Rust for a Pythonista #2: Building a Rust crate for CSS inlining
- Rust for a Pythonista #3: Python bindings
Thank you,
Dmitry
❤ ❤ ❤