File: /src/QueryPath/QueryPath.php

Description

The Query Path package provides tools for manipulating a Document Object Model.

The two major DOMs are the XML DOM and the HTML DOM. Using Query Path, you can build, parse, search, and modify DOM documents.

To use Query Path, this is the only file you should need to import.

Standard usage:

  1.  <?php
  2.  require 'QueryPath/QueryPath.php';
  3.  $qp qp('#myID''<?xml version="1.0"?><test><foo id="myID"/></test>');
  4.  $qp->append('<new><elements/></new>')->writeHTML();
  5.  ?>

The above would print (formatted for readability):

  1.  <?xml version="1.0"?>
  2.  <test>
  3.   <foo id="myID">
  4.     <new>
  5.       <element/>
  6.     </new>
  7.   </foo>
  8.  </test>

To learn about the functions available to a Query Path object, see QueryPath. The qp() function is used to build new QueryPath objects. The documentation for that function explains the wealth of arguments that the function can take.

Included with the source code for QueryPath is a complete set of unit tests as well as some example files. Those are good resources for learning about how to apply QueryPath's tools. The full API documentation can be generated from these files using PHPDocumentor.

If you are interested in building extensions for QueryParser, see the QueryPathExtender class. There, you will find information on adding your own tools to QueryPath.

QueryPath also comes with a full CSS 3 selector parser implementation. If you are interested in reusing that in other code, you will want to start with CssEventHandler.php, which is the event interface for the parser.

All of the code in QueryPath is licensed under either the LGPL or an MIT-like license (you may choose which you prefer). All of the code is Copyright, 2009 by Matt Butcher.

Classes defined in this file

CLASS NAME

DESCRIPTION

QueryPath The Query Path object is the primary tool in this library.
QueryPathEntities
QueryPathIterator An iterator for QueryPath.
QueryPathOptions Manage default options.
QueryPathException Exception indicating that a problem has occured inside of a QueryPath object.
QueryPathParseException Exception indicating that a parser has failed to parse a file.
QueryPathIOException Exception indicating that a parser has failed to parse a file.

Include/Require Statements

'CssEventHandler.php' (line 79)

require_once : 'CssEventHandler.php'

The CssEventHandler interfaces with the CSS parser.

'QueryPathExtension.php' (line 83)

require_once : 'QueryPathExtension.php'

The extender is used to provide support for extensions.

Global Variables

Constants

ML_EXP (line 74)

ML_EXP : '/^[^<]*(<(.|\s)+>)[^>]*$/'

Regular expression for checking whether a string looks like XML.
  • deprecated: - This is no longer used in QueryPath.

Functions

htmlqp (line 208)

void htmlqp( [ $document = NULL], [ $selector = NULL], [ $options = array()])

A special-purpose version of qp() designed specifically for HTML.

XHTML (if valid) can be easily parsed by qp() with no problems. However, because of the way that libxml handles HTML, there are several common steps that need to be taken to reliably parse non-XML HTML documents. This function is a convenience tool for configuring QueryPath to parse HTML.

The following options are automatically set unless overridden:

  • ignore_parser_warnings: TRUE
  • convert_to_encoding: ISO-8859-1 (the best for the HTML parser).
  • convert_from_encoding: auto (autodetect encoding)
  • use_parser: html
Parser warning messages are also suppressed, so if the parser emits a warning, the application will not be notified. This is equivalent to calling
  1. @qp()
.

Warning: Character set conversions will only work if the Multi-Byte (mb) library is installed and enabled. This is usually enabled, but not always.

Parameters

  • $document:
  • $selector:
  • $options:

Info

qp (line 177)

void qp( [mixed $document = NULL], [string $string = NULL], [array $options = array()])

Build a new Query Path.

This builds a new Query Path object. The new object can be used for reading, search, and modifying a document.

While it is permissible to directly create new instances of a QueryPath implementation, it is not advised. Instead, you should use this function as a factory.

Example:

  1.  <?php
  2.  qp()// New empty QueryPath
  3.  qp('path/to/file.xml')// From a file
  4.  qp('<html><head></head><body></body></html>')// From HTML or XML
  5.  qp(QueryPath::XHTML_STUB)// From a basic HTML document.
  6.  qp(QueryPath::XHTML_STUB'title')// Create one from a basic HTML doc and position it at the title element.
  7.  
  8.  // Most of the time, methods are chained directly off of this call.
  9.  qp(QueryPath::XHTML_STUB'body')->append('<h1>Title</h1>')->addClass('body-class');
  10.  ?>

This function is used internally by QueryPath. Anything that modifies the behavior of this function may also modify the behavior of common QueryPath methods.

Parameters

  • mixed $document: A document in one of the following forms:
    • A string of XML or HTML (See XHTML_STUB)
    • A path on the file system or a URL
    • A DOMDocument object
    • A SimpleXMLElement object.
    • A DOMNode object.
    • An array of DOMNode objects (generally DOMElement nodes).
    • Another QueryPath object.
    Keep in mind that most features of QueryPath operate on elements. Other sorts of DOMNodes might not work with all features.
  • string $string: A CSS 3 selector.
  • array $options: An associative array of options. Currently supported options are:
    • context: A stream context object. This is used to pass context info to the underlying file IO subsystem.
    • encoding: A valid character encoding, such as 'utf-8' or 'ISO-8859-1'. The default is system-dependant, typically UTF-8. Note that this is only used when creating new documents, not when reading existing content. (See convert_to_encoding below.)
    • parser_flags: An OR-combined set of parser flags. The flags supported by the DOMDocument PHP class are all supported here.
    • omit_xml_declaration: Boolean. If this is TRUE, then certain output methods (like QueryPath::xml()) will omit the XML declaration from the beginning of a document.
    • replace_entities: Boolean. If this is TRUE, then any of the insertion functions (before(), append(), etc.) will replace named entities with their decimal equivalent, and will replace un-escaped ampersands with a numeric entity equivalent.
    • ignore_parser_warnings: Boolean. If this is TRUE, then E_WARNING messages generated by the XML parser will not cause QueryPath to throw an exception. This is useful when parsing badly mangled HTML, or when failure to find files should not result in an exception. By default, this is FALSE -- that is, parsing warnings and IO warnings throw exceptions.
    • convert_to_encoding: Use the MB library to convert the document to the named encoding before parsing. This is useful for old HTML (set it to iso-8859-1 for best results). If this is not supplied, no character set conversion will be performed. See http://www.php.net/mb_convert_encoding. (QueryPath 1.3 and later)
    • convert_from_encoding: If 'convert_to_encoding' is set, this option can be used to explicitly define what character set the source document is using. By default, QueryPath will allow the MB library to guess the encoding. (QueryPath 1.3 and later)
    • use_parser: If 'xml', Parse the document as XML. If 'html', parse the document as HTML. Note that the XML parser is very strict, while the HTML parser is more lenient, but does enforce some of the DTD/Schema. By default, QueryPath autodetects the type.
    • QueryPath_class: (ADVANCED) Use this to set the actual classname that qp() loads as a QueryPath instance. It is assumed that the class is either QueryPath or a subclass thereof. See the test cases for an example.

Info


Documentation generated on Sun, 25 Jul 2010 16:09:06 -0500 by phpDocumentor 1.4.3