Select HTML elements with more than one css class using XPath
During a discussion on IRC with Thomas Weinert we asked ourselves how it would be possible to select HTML elements by a given css class, if it has multiple classes defined. Think of something like this:
<div class="foo bar baz">42</div>This div element has got the three classes foo, bar and baz associated with it. If you want to select all HTML nodes with the class foo, this div element would be one of them.
The XPath expression to solve this selection problem might not be quite obvious.
After a little bit of thinking about the it I came up with the following solution:
//*[count( index-of( tokenize( @class, '\s+' ), '$classname' ) ) = 1]This selection works quite well. Unfortunately it uses the functions tokenize and index-of, which are only available in XPath 2.0. Unfortunately this is not supported by PHP, which renders the expression above virtually useless for the scenario it should be used in.
Therefore I tried to think of something different, only using XPath 1.0 functions. The following expression is what I came up with:
//*[
contains( normalize-space( @class ), ' $classname ' )
or substring( normalize-space( @class ), 1, string-length( '$classname' ) + 1 ) = '$classname '
or substring( normalize-space( @class ), string-length( @class ) - string-length( '$classname' ) ) = ' $classname'
or @class = '$classname'
]The normalize-spaces function takes care of replacing all tab and whitespace sequences with a single whitespace character. After that only four matchings are possible. The First of this disjunctions ensures a proper matching if the class is defined somewhere inside the class definition list. The second disjunction matches only classnames at the beginning of the class list, whereas the third one matches only classnames at the end of the list. The fourth one matches in case only one classname is defined. Unfortunately this kind of complexity is needed to ensure no partial classnames are matched.
You may download a hackish demonstration script here, which uses the presented expression in combination with PHP DOM to select nodes with a certain class.
Sam Shull on Tue, 09 Jun 2009 04:14:35 +0200
How about:
Link to comment//*[
@class and
contains(
concat(
' ',
normalize-space(@class),
' '
),
' $classname '
)
]
Ruben Wagner on Tue, 09 Jun 2009 11:19:58 +0200
PHP (libxslt) supports exslt (http://www.exslt.org) so you can use
Link to comment//div['foo' = str:tokenize(@class)]
if you define the namespace "str" in your stylesheet: xmlns:str="http://exslt.org/strings"
<xsl:stylesheet
version="1.0"
xmlns:str="http://exslt.org/strings"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>
<!-- ... -->
</xsl:stylesheet>
P.S. When you compare a string with a node set it returns true, if any node's value equals the string.
Jakob on Tue, 09 Jun 2009 13:42:54 +0200
@Sam:
Link to commentThe problem with this approach is that parts of classes would be matched, too.
For example if <div class="foobar"> is given it would match the class "bar" or "foo", which is not the intended behaviour.
@Ruben:
You are right about libxslt supporting exslt, which enables us to use these functions in XSLTs. Unfortunately I needed an expression to simple Match DOM nodes inside of PHP. I didn't want to write an XSLT for it, apply it to the source document and read the result in. This seems to be a lot of unneeded hassle ;).
Anyway thanks for the information about the node sets in conjunction with equal checks. This makes the XPath 2.0 expression even more elegant.
greetings
Jakob
Jan! on Tue, 09 Jun 2009 15:50:56 +0200
@Jakob: how exactly would Sam's method fail on that example? It would be searching for " bar " in " foobar ".
Link to commentJakob on Tue, 09 Jun 2009 17:25:43 +0200
@Jan!:
Link to commentYeah you are right. I have been in a hurry and overlooked the spaces around $classname. I thought he was proposing to search for '$classname' (without the spaces).
@Sam:
I am sorry about misreading your expression first. What you proposed should work without a problem. And is a really elegant solution :) Thanks for your input
greetings
Jakob
Nitin on Fri, 02 Oct 2009 23:31:47 +0200
Just wanted to drop my regards to you, it helped me in one of my recent project.
Link to commentThanks Jacob and Sam.
luka8088 on Sun, 13 Dec 2009 11:01:28 +0100
Hi, thanks for posting this, it was very helpful !
Link to commentI was hoping to find something like |= in css3, but this is also a elegant solution :)
I just don't understand what is the purpose of normalize-space ? It will work the same without it ... Or it is faster that way ? Also, checking that element has a class attribute ? Is it for optimization or ?
Anyway, good job and thanks :)
Jakob on Sun, 13 Dec 2009 16:05:54 +0100
@luka8088:
Link to commentI am glad I could help.
"normalize-space" ensures every space/tab combination inbetween to classes is reduced to one simple space. This way it is ensured that the matching does work even if there are multiple spaces or tabs between the different classes.
greetings,
Jakob
edwin on Thu, 18 Feb 2010 07:43:04 +0100
Nice post.
Link to comment