Select HTML elements with more than one css class using XPath - Westhoffswelt - Welcome to the real world

Select HTML elements with more than one css class using XPath

During a discussion on IRC with Thomas Weinert we asked ourselves how it would be possible to select HTML elements by a given css class, if it has multiple classes defined. Think of something like this:

<div class="foo bar baz">42</div>

This div element has got the three classes foo, bar and baz associated with it. If you want to select all HTML nodes with the class foo, this div element would be one of them.

The XPath expression to solve this selection problem might not be quite obvious.

After a little bit of thinking about the it I came up with the following solution:

//*[count( index-of( tokenize( @class, '\s+' ), '$classname' ) ) = 1]

This selection works quite well. Unfortunately it uses the functions tokenize and index-of, which are only available in XPath 2.0. Unfortunately this is not supported by PHP, which renders the expression above virtually useless for the scenario it should be used in.

Therefore I tried to think of something different, only using XPath 1.0 functions. The following expression is what I came up with:

//*[ contains( normalize-space( @class ), ' $classname ' ) or substring( normalize-space( @class ), 1, string-length( '$classname' ) + 1 ) = '$classname ' or substring( normalize-space( @class ), string-length( @class ) - string-length( '$classname' ) ) = ' $classname' or @class = '$classname' ]

The normalize-spaces function takes care of replacing all tab and whitespace sequences with a single whitespace character. After that only four matchings are possible. The First of this disjunctions ensures a proper matching if the class is defined somewhere inside the class definition list. The second disjunction matches only classnames at the beginning of the class list, whereas the third one matches only classnames at the end of the list. The fourth one matches in case only one classname is defined. Unfortunately this kind of complexity is needed to ensure no partial classnames are matched.

You may download a hackish demonstration script here, which uses the presented expression in combination with PHP DOM to select nodes with a certain class.

Trackbacks

Comments

  • Sam Shull on Tue, 09 Jun 2009 04:14:35 +0200

    How about:

    //*[
    @class and
    contains(
    concat(
    ' ',
    normalize-space(@class),
    ' '
    ),
    ' $classname '
    )
    ]

  • Ruben Wagner on Tue, 09 Jun 2009 11:19:58 +0200

    PHP (libxslt) supports exslt (http://www.exslt.org) so you can use

    //div['foo' = str:tokenize(@class)]

    if you define the namespace "str" in your stylesheet: xmlns:str="http://exslt.org/strings"

    <xsl:stylesheet
    version="1.0"
    xmlns:str="http://exslt.org/strings"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    >
    <!-- ... -->
    </xsl:stylesheet>

    P.S. When you compare a string with a node set it returns true, if any node's value equals the string.

  • Jakob on Tue, 09 Jun 2009 13:42:54 +0200

    @Sam:

    The problem with this approach is that parts of classes would be matched, too.
    For example if <div class="foobar"> is given it would match the class "bar" or "foo", which is not the intended behaviour.

    @Ruben:

    You are right about libxslt supporting exslt, which enables us to use these functions in XSLTs. Unfortunately I needed an expression to simple Match DOM nodes inside of PHP. I didn't want to write an XSLT for it, apply it to the source document and read the result in. This seems to be a lot of unneeded hassle ;).

    Anyway thanks for the information about the node sets in conjunction with equal checks. This makes the XPath 2.0 expression even more elegant.

    greetings
    Jakob

  • Jan! on Tue, 09 Jun 2009 15:50:56 +0200

    @Jakob: how exactly would Sam's method fail on that example? It would be searching for " bar " in " foobar ".

  • Jakob on Tue, 09 Jun 2009 17:25:43 +0200

    @Jan!:

    Yeah you are right. I have been in a hurry and overlooked the spaces around $classname. I thought he was proposing to search for '$classname' (without the spaces).

    @Sam:

    I am sorry about misreading your expression first. What you proposed should work without a problem. And is a really elegant solution :) Thanks for your input

    greetings
    Jakob

  • Nitin on Fri, 02 Oct 2009 23:31:47 +0200

    Just wanted to drop my regards to you, it helped me in one of my recent project.

    Thanks Jacob and Sam.

  • luka8088 on Sun, 13 Dec 2009 11:01:28 +0100

    Hi, thanks for posting this, it was very helpful !

    I was hoping to find something like |= in css3, but this is also a elegant solution :)

    I just don't understand what is the purpose of normalize-space ? It will work the same without it ... Or it is faster that way ? Also, checking that element has a class attribute ? Is it for optimization or ?

    Anyway, good job and thanks :)

  • Jakob on Sun, 13 Dec 2009 16:05:54 +0100

    @luka8088:
    I am glad I could help.
    "normalize-space" ensures every space/tab combination inbetween to classes is reduced to one simple space. This way it is ensured that the matching does work even if there are multiple spaces or tabs between the different classes.

    greetings,
    Jakob

  • edwin on Thu, 18 Feb 2010 07:43:04 +0100

    Nice post.

Add new comment

Fields with bold names are mandatory.