Add Composer autoloading functions and PHPStan for testing

This commit is contained in:
Alex Cabal 2019-02-26 13:03:45 -06:00
parent e198c4db65
commit f5d7d4e02a
1518 changed files with 169063 additions and 30 deletions

View file

@ -0,0 +1,80 @@
Introduction
============
This project is a PHP 5.2 to PHP 7.3 parser **written in PHP itself**.
What is this for?
-----------------
A parser is useful for [static analysis][0], manipulation of code and basically any other
application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
(AST) of the code and thus allows dealing with it in an abstract and robust way.
There are other ways of processing source code. One that PHP supports natively is using the
token stream generated by [`token_get_all`][2]. The token stream is much more low level than
the AST and thus has different applications: It allows to also analyze the exact formatting of
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
For example, an AST abstracts away the fact that, in PHP, variables can be written as `$foo`, but also
as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
all the different syntaxes from a stream of tokens.
Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
would be in other, faster languages like C. Furthermore the people most probably wanting to do
programmatic PHP code analysis are incidentally PHP developers, not C developers.
What can it parse?
------------------
The parser supports parsing PHP 5.2-7.3.
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
This allows to parse PHP 7.3 source code running on PHP 7.0, for example. This emulation is somewhat
hacky and not perfect, but it should work well on any sane code.
What output does it produce?
----------------------------
The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks
can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
roughly looking like this:
```
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Scalar_String(
value: World
)
)
)
)
```
This matches the structure of the code: An echo statement, which takes two strings as expressions,
with the values `Hi` and `World!`.
You can also see that the AST does not contain any whitespace information (but most comments are saved).
So using it for formatting analysis is not possible.
What else can it do?
--------------------
Apart from the parser itself this package also bundles support for some other, related features:
* Support for pretty printing, which is the act of converting an AST into PHP code. Please note
that "pretty printing" does not imply that the output is especially pretty. It's just how it's
called ;)
* Support for serializing and unserializing the node tree to JSON
* Support for dumping the node tree in a human readable form (see the section above for an
example of how the output looks like)
* Infrastructure for traversing and changing the AST (node traverser and node visitors)
* A node visitor for resolving namespaced names
[0]: http://en.wikipedia.org/wiki/Static_program_analysis
[1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
[2]: http://php.net/token_get_all

View file

@ -0,0 +1,516 @@
Usage of basic components
=========================
This document explains how to use the parser, the pretty printer and the node traverser.
Bootstrapping
-------------
To bootstrap the library, include the autoloader generated by composer:
```php
require 'path/to/vendor/autoload.php';
```
Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
```php
ini_set('xdebug.max_nesting_level', 3000);
```
This ensures that there will be no errors when traversing highly nested node trees. However, it is
preferable to disable XDebug completely, as it can easily make this library more than five times
slower.
Parsing
-------
In order to parse code, you first have to create a parser instance:
```php
use PhpParser\ParserFactory;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
```
The factory accepts a kind argument, that determines how different PHP versions are treated:
Kind | Behavior
-----|---------
`ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5.
`ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7.
`ParserFactory::ONLY_PHP7` | Parse code as PHP 7.
`ParserFactory::ONLY_PHP5` | Parse code as PHP 5.
Unless you have a strong reason to use something else, `PREFER_PHP7` is a reasonable default.
The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases
that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown).
Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
```php
<?php
use PhpParser\Error;
use PhpParser\ParserFactory;
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hello World!!!');
CODE;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
try {
$stmts = $parser->parse($code);
// $stmts is an array of statement nodes
} catch (Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
A parser instance can be reused to parse multiple files.
Node dumping
------------
To dump the abstact syntax tree in human readable form, a `NodeDumper` can be used:
```php
<?php
use PhpParser\NodeDumper;
$nodeDumper = new NodeDumper;
echo $nodeDumper->dump($stmts), "\n";
```
For the sample code from the previous section, this will produce the following output:
```
array(
0: Stmt_Function(
byRef: false
name: Identifier(
name: printLine
)
params: array(
0: Param(
type: null
byRef: false
variadic: false
var: Expr_Variable(
name: msg
)
default: null
)
)
returnType: null
stmts: array(
0: Stmt_Echo(
exprs: array(
0: Expr_Variable(
name: msg
)
1: Scalar_String(
value:
)
)
)
)
)
1: Stmt_Expression(
expr: Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hello World!!!
)
byRef: false
unpack: false
)
)
)
)
)
```
You can also use the `php-parse` script to obtain such a node dump by calling it either with a file
name or code string:
```sh
vendor/bin/php-parse file.php
vendor/bin/php-parse "<?php foo();"
```
This can be very helpful if you want to quickly check how certain syntax is represented in the AST.
Node tree structure
-------------------
Looking at the node dump above, you can see that `$stmts` for this example code is an array of two
nodes, a `Stmt_Function` and a `Stmt_Expression`. The corresponding class names are:
* `Stmt_Function -> PhpParser\Node\Stmt\Function_`
* `Stmt_Expression -> PhpParser\Node\Stmt\Expression`
The additional `_` at the end of the first class name is necessary, because `Function` is a
reserved keyword. Many node class names in this library have a trailing `_` to avoid clashing with
a keyword.
As PHP is a large language there are approximately 140 different nodes. In order to make working
with them easier they are grouped into three categories:
* `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
a value and can not occur in an expression. For example a class definition is a statement.
It doesn't return a value and you can't write something like `func(class A {});`.
* `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
and thus can occur in other expressions. Examples of expressions are `$var`
(`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
* `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
(`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
`PhpParser\Node\Expr`, as scalars are expressions, too.
* There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
and call arguments (`PhpParser\Node\Arg`).
The `Node\Stmt\Expression` node is somewhat confusing in that it contains both the terms "statement"
and "expression". This node distinguishes `expr`, which is a `Node\Expr`, from `expr;`, which is
an "expression statement" represented by `Node\Stmt\Expression` and containing `expr` as a sub-node.
Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
call, you would write `$stmts[0]->exprs[1]->name`.
All nodes also define a `getType()` method that returns the node type. The type is the class name
without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
`_` for reserved-keyword class names.
It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array
of `PhpParser\Comment[\Doc]` instances.
The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`).
The last doc comment from the `comments` attribute can be obtained using `getDocComment()`.
Pretty printer
--------------
The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting
information the formatting is done using a specified scheme. Currently there is only one scheme available,
namely `PhpParser\PrettyPrinter\Standard`.
```php
use PhpParser\Error;
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter;
$code = "<?php echo 'Hi ', hi\\getTarget();";
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$prettyPrinter = new PrettyPrinter\Standard;
try {
// parse
$stmts = $parser->parse($code);
// change
$stmts[0] // the echo statement
->exprs // sub expressions
[0] // the first of them (the string node)
->value // it's value, i.e. 'Hi '
= 'Hello '; // change to 'Hello '
// pretty print
$code = $prettyPrinter->prettyPrint($stmts);
echo $code;
} catch (Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The above code will output:
echo 'Hello ', hi\getTarget();
As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
single expression using `prettyPrintExpr()`.
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
and handle inline HTML as the first/last statement more gracefully.
> Read more: [Pretty printing documentation](component/Pretty_printing.markdown)
Node traversation
-----------------
The above pretty printing example used the fact that the source code was known and thus it was easy to
write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
going to look like.
For this purpose the parser provides a component for traversing and visiting the node tree. The basic
structure of a program using this `PhpParser\NodeTraverser` looks like this:
```php
use PhpParser\NodeTraverser;
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$traverser = new NodeTraverser;
$prettyPrinter = new PrettyPrinter\Standard;
// add your visitor
$traverser->addVisitor(new MyNodeVisitor);
try {
$code = file_get_contents($fileName);
// parse
$stmts = $parser->parse($code);
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = $prettyPrinter->prettyPrintFile($stmts);
echo $code;
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The corresponding node visitor might look like this:
```php
use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
class MyNodeVisitor extends NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\String_) {
$node->value = 'foo';
}
}
}
```
The above node visitor would change all string literals in the program to `'foo'`.
All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
methods:
```php
public function beforeTraverse(array $nodes);
public function enterNode(\PhpParser\Node $node);
public function leaveNode(\PhpParser\Node $node);
public function afterTraverse(array $nodes);
```
The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
traverser was called with. This method can be used for resetting values before traversation or
preparing the tree for traversal.
The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
it is called once after the traversal.
The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
i.e. before its subnodes are traversed, the latter when it is left.
All four methods can either return the changed node or not return at all (i.e. `null`) in which
case the current node is not changed.
The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`,
which instructs the traverser to skip all children of the current node. To furthermore prevent subsequent
visitors from visiting the current node, `NodeTraverser::DONT_TRAVERSE_CURRENT_AND_CHILDREN` can be used instead.
The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which
case the current node will be removed from the parent array. Furthermore it is possible to return
an array of nodes, which will be merged into the parent array at the offset of the current node.
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
be `array(A, X, Y, Z, C)`.
Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
class, which will define empty default implementations for all the above methods.
> Read more: [Walking the AST](component/Walking_the_AST.markdown)
The NameResolver node visitor
-----------------------------
One visitor that is already bundled with the package is `PhpParser\NodeVisitor\NameResolver`. This visitor
helps you work with namespaced code by trying to resolve most names to fully qualified ones.
For example, consider the following code:
use A as B;
new B\C();
In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
The `NameResolver` takes care of that and resolves names as far as possible.
After running it, most names will be fully qualified. The only names that will stay unqualified are
unqualified function and constant names. These are resolved at runtime and thus the visitor can't
know which function they are referring to. In most cases this is a non-issue as the global functions
are meant.
Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations
that contains the namespaced name instead of only the shortname that is available via `name`.
> Read more: [Name resolution documentation](component/Name_resolution.markdown)
Example: Converting namespaced code to pseudo namespaces
--------------------------------------------------------
A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
are fairly complicated if you take PHP's dynamic features into account, so our conversion will
assume that no dynamic features are used.
We start off with the following base code:
```php
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitor\NameResolver;
$inDir = '/some/path';
$outDir = '/some/other/path';
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$traverser = new NodeTraverser;
$prettyPrinter = new PrettyPrinter\Standard;
$traverser->addVisitor(new NameResolver); // we will need resolved names
$traverser->addVisitor(new NamespaceConverter); // our own node visitor
// iterate over all .php files in the directory
$files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
$files = new \RegexIterator($files, '/\.php$/');
foreach ($files as $file) {
try {
// read the file that should be converted
$code = file_get_contents($file->getPathName());
// parse
$stmts = $parser->parse($code);
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = $prettyPrinter->prettyPrintFile($stmts);
// write the converted file to the target directory
file_put_contents(
substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
$code
);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
}
```
Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
is convert `A\\B` style names to `A_B` style ones.
```php
use PhpParser\Node;
class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Name) {
return new Node\Name(str_replace('\\', '_', $node->toString()));
}
}
}
```
The above code profits from the fact that the `NameResolver` already resolved all names as far as
possible, so we don't need to do that. We only need to create a string with the name parts separated
by underscores instead of backslashes. This is what `str_replace('\\', '_', $node->toString())` does. (If you want to
create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
a new name from the string and return it. Returning a new node replaces the old node.
Another thing we need to do is change the class/function/const declarations. Currently they contain
only the shortname (i.e. the last part of the name), but they need to contain the complete name including
the namespace prefix:
```php
use PhpParser\Node;
use PhpParser\Node\Stmt;
class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Name) {
return new Node\Name(str_replace('\\', '_', $node->toString()));
} elseif ($node instanceof Stmt\Class_
|| $node instanceof Stmt\Interface_
|| $node instanceof Stmt\Function_) {
$node->name = str_replace('\\', '_', $node->namespacedName->toString());
} elseif ($node instanceof Stmt\Const_) {
foreach ($node->consts as $const) {
$const->name = str_replace('\\', '_', $const->namespacedName->toString());
}
}
}
}
```
There is not much more to it than converting the namespaced name to string with `_` as separator.
The last thing we need to do is remove the `namespace` and `use` statements:
```php
use PhpParser\Node;
use PhpParser\Node\Stmt;
use PhpParser\NodeTraverser;
class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Name) {
return new Node\Name(str_replace('\\', '_', $node->toString()));
} elseif ($node instanceof Stmt\Class_
|| $node instanceof Stmt\Interface_
|| $node instanceof Stmt\Function_) {
$node->name = str_replace('\\', '_', $node->namespacedName->toString();
} elseif ($node instanceof Stmt\Const_) {
foreach ($node->consts as $const) {
$const->name = str_replace('\\', '_', $const->namespacedName->toString());
}
} elseif ($node instanceof Stmt\Namespace_) {
// returning an array merges is into the parent array
return $node->stmts;
} elseif ($node instanceof Stmt\Use_) {
// remove use nodes altogether
return NodeTraverser::REMOVE_NODE;
}
}
}
```
That's all.

46
vendor/nikic/php-parser/doc/README.md vendored Normal file
View file

@ -0,0 +1,46 @@
Table of Contents
=================
Guide
-----
1. [Introduction](0_Introduction.markdown)
2. [Usage of basic components](2_Usage_of_basic_components.markdown)
Component documentation
-----------------------
* [Walking the AST](component/Walking_the_AST.markdown)
* Node visitors
* Modifying the AST from a visitor
* Short-circuiting traversals
* Interleaved visitors
* Simple node finding API
* Parent and sibling references
* [Name resolution](component/Name_resolution.markdown)
* Name resolver options
* Name resolution context
* [Pretty printing](component/Pretty_printing.markdown)
* Converting AST back to PHP code
* Customizing formatting
* Formatting-preserving code transformations
* [AST builders](component/AST_builders.markdown)
* Fluent builders for AST nodes
* [Lexer](component/Lexer.markdown)
* Lexer options
* Token and file positions for nodes
* Custom attributes
* [Error handling](component/Error_handling.markdown)
* Column information for errors
* Error recovery (parsing of syntactically incorrect code)
* [Constant expression evaluation](component/Constant_expression_evaluation.markdown)
* Evaluating constant/property/etc initializers
* Handling errors and unsupported expressions
* [JSON representation](component/JSON_representation.markdown)
* JSON encoding and decoding of ASTs
* [Performance](component/Performance.markdown)
* Disabling XDebug
* Reusing objects
* Garbage collection impact
* [Frequently asked questions](component/FAQ.markdown)
* Parent and sibling references

View file

@ -0,0 +1,138 @@
AST builders
============
When PHP-Parser is used to generate (or modify) code by first creating an Abstract Syntax Tree and
then using the [pretty printer](Pretty_printing.markdown) to convert it to PHP code, it can often
be tedious to manually construct AST nodes. The project provides a number of utilities to simplify
the construction of common AST nodes.
Fluent builders
---------------
The library comes with a number of builders, which allow creating node trees using a fluent
interface. Builders are created using the `BuilderFactory` and the final constructed node is
accessed through `getNode()`. Fluent builders are available for
the following syntactic elements:
* namespaces and use statements
* classes, interfaces and traits
* methods, functions and parameters
* properties
Here is an example:
```php
use PhpParser\BuilderFactory;
use PhpParser\PrettyPrinter;
use PhpParser\Node;
$factory = new BuilderFactory;
$node = $factory->namespace('Name\Space')
->addStmt($factory->use('Some\Other\Thingy')->as('SomeClass'))
->addStmt($factory->useFunction('strlen'))
->addStmt($factory->useConst('PHP_VERSION'))
->addStmt($factory->class('SomeOtherClass')
->extend('SomeClass')
->implement('A\Few', '\Interfaces')
->makeAbstract() // ->makeFinal()
->addStmt($factory->useTrait('FirstTrait'))
->addStmt($factory->useTrait('SecondTrait', 'ThirdTrait')
->and('AnotherTrait')
->with($factory->traitUseAdaptation('foo')->as('bar'))
->with($factory->traitUseAdaptation('AnotherTrait', 'baz')->as('test'))
->with($factory->traitUseAdaptation('AnotherTrait', 'func')->insteadof('SecondTrait')))
->addStmt($factory->method('someMethod')
->makePublic()
->makeAbstract() // ->makeFinal()
->setReturnType('bool') // ->makeReturnByRef()
->addParam($factory->param('someParam')->setType('SomeClass'))
->setDocComment('/**
* This method does something.
*
* @param SomeClass And takes a parameter
*/')
)
->addStmt($factory->method('anotherMethod')
->makeProtected() // ->makePublic() [default], ->makePrivate()
->addParam($factory->param('someParam')->setDefault('test'))
// it is possible to add manually created nodes
->addStmt(new Node\Expr\Print_(new Node\Expr\Variable('someParam')))
)
// properties will be correctly reordered above the methods
->addStmt($factory->property('someProperty')->makeProtected())
->addStmt($factory->property('anotherProperty')->makePrivate()->setDefault(array(1, 2, 3)))
)
->getNode()
;
$stmts = array($node);
$prettyPrinter = new PrettyPrinter\Standard();
echo $prettyPrinter->prettyPrintFile($stmts);
```
This will produce the following output with the standard pretty printer:
```php
<?php
namespace Name\Space;
use Some\Other\Thingy as SomeClass;
use function strlen;
use const PHP_VERSION;
abstract class SomeOtherClass extends SomeClass implements A\Few, \Interfaces
{
use FirstTrait;
use SecondTrait, ThirdTrait, AnotherTrait {
foo as bar;
AnotherTrait::baz as test;
AnotherTrait::func insteadof SecondTrait;
}
protected $someProperty;
private $anotherProperty = array(1, 2, 3);
/**
* This method does something.
*
* @param SomeClass And takes a parameter
*/
public abstract function someMethod(SomeClass $someParam) : bool;
protected function anotherMethod($someParam = 'test')
{
print $someParam;
}
}
```
Additional helper methods
-------------------------
The `BuilderFactory` also provides a number of additional helper methods, which directly return
nodes. The following methods are currently available:
* `val($value)`: Creates an AST node for a literal value like `42` or `[1, 2, 3]`.
* `var($name)`: Creates variable node.
* `args(array $args)`: Creates an array of function/method arguments, including the required `Arg`
wrappers. Also converts literals to AST nodes.
* `funcCall($name, array $args = [])`: Create a function call node. Converts `$name` to a `Name`
node and normalizes arguments.
* `methodCall(Expr $var, $name, array $args = [])`: Create a method call node. Converts `$name` to
an `Identifier` node and normalizes arguments.
* `staticCall($class, $name, array $args = [])`: Create a static method call node. Converts
`$class` to a `Name` node, `$name` to an `Identifier` node and normalizes arguments.
* `new($class, array $args = [])`: Create a "new" (object creation) node. Converts `$class` to a
`Name` node.
* `constFetch($name)`: Create a constant fetch node. Converts `$name` to a `Name` node.
* `classConstFetch($class, $name)`: Create a class constant fetch node. Converts `$class` to a
`Name` node and `$name` to an `Identifier` node.
* `propertyFetch($var, $name)`: Creates a property fetch node. Converts `$name` to an `Identifier`
node.
* `concat(...$exprs)`: Create a tree of `BinaryOp\Concat` nodes for the given expressions.
These methods may be expanded on an as-needed basis. Please open an issue or PR if a common
operation is missing.

View file

@ -0,0 +1,115 @@
Constant expression evaluation
==============================
Initializers for constants, properties, parameters, etc. have limited support for expressions. For
example:
```php
<?php
class Test {
const SECONDS_IN_HOUR = 60 * 60;
const SECONDS_IN_DAY = 24 * self::SECONDS_IN_HOUR;
}
```
PHP-Parser supports evaluation of such constant expressions through the `ConstExprEvaluator` class:
```php
<?php
use PhpParser\{ConstExprEvaluator, ConstExprEvaluationException};
$evalutator = new ConstExprEvaluator();
try {
$value = $evalutator->evaluateSilently($someExpr);
} catch (ConstExprEvaluationException $e) {
// Either the expression contains unsupported expression types,
// or an error occurred during evaluation
}
```
Error handling
--------------
The constant evaluator provides two methods, `evaluateDirectly()` and `evaluateSilently()`, which
differ in error behavior. `evaluateDirectly()` will evaluate the expression as PHP would, including
any generated warnings or Errors. `evaluateSilently()` will instead convert warnings and Errors into
a `ConstExprEvaluationException`. For example:
```php
<?php
use PhpParser\{ConstExprEvaluator, ConstExprEvaluationException};
use PhpParser\Node\{Expr, Scalar};
$evaluator = new ConstExprEvaluator();
// 10 / 0
$expr = new Expr\BinaryOp\Div(new Scalar\LNumber(10), new Scalar\LNumber(0));
var_dump($evaluator->evaluateDirectly($expr)); // float(INF)
// Warning: Division by zero
try {
$evaluator->evaluateSilently($expr);
} catch (ConstExprEvaluationException $e) {
var_dump($e->getPrevious()->getMessage()); // Division by zero
}
```
For the purposes of static analysis, you will likely want to use `evaluateSilently()` and leave
erroring expressions unevaluated.
Unsupported expressions and evaluator fallback
----------------------------------------------
The constant expression evaluator supports all expression types that are permitted in constant
expressions, apart from the following:
* `Scalar\MagicConst\*`
* `Expr\ConstFetch` (only null/false/true are handled)
* `Expr\ClassConstFetch`
Handling these expression types requires non-local information, such as which global constants are
defined. By default, the evaluator will throw a `ConstExprEvaluationException` when it encounters
an unsupported expression type.
It is possible to override this behavior and support resolution for these expression types by
specifying an evaluation fallback function:
```php
<?php
use PhpParser\{ConstExprEvaluator, ConstExprEvaluationException};
use PhpParser\Node\Expr;
$evalutator = new ConstExprEvaluator(function(Expr $expr) {
if ($expr instanceof Expr\ConstFetch) {
return fetchConstantSomehow($expr);
}
if ($expr instanceof Expr\ClassConstFetch) {
return fetchClassConstantSomehow($expr);
}
// etc.
throw new ConstExprEvaluationException(
"Expression of type {$expr->getType()} cannot be evaluated");
});
try {
$evalutator->evaluateSilently($someExpr);
} catch (ConstExprEvaluationException $e) {
// Handle exception
}
```
Implementers are advised to ensure that evaluation of indirect constant references cannot lead to
infinite recursion. For example, the following code could lead to infinite recursion if constant
lookup is implemented naively.
```php
<?php
class Test {
const A = self::B;
const B = self::A;
}
```

View file

@ -0,0 +1,75 @@
Error handling
==============
Errors during parsing or analysis are represented using the `PhpParser\Error` exception class. In addition to an error
message, an error can also store additional information about the location the error occurred at.
How much location information is available depends on the origin of the error and how many lexer attributes have been
enabled. At a minimum the start line of the error is usually available.
Column information
------------------
In order to receive information about not only the line, but also the column span an error occurred at, the file
position attributes in the lexer need to be enabled:
```php
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array('comments', 'startLine', 'endLine', 'startFilePos', 'endFilePos'),
));
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer);
try {
$stmts = $parser->parse($code);
// ...
} catch (PhpParser\Error $e) {
// ...
}
```
Before using column information, its availability needs to be checked with `$e->hasColumnInfo()`, as the precise
location of an error cannot always be determined. The methods for retrieving column information also have to be passed
the source code of the parsed file. An example for printing an error:
```php
if ($e->hasColumnInfo()) {
echo $e->getRawMessage() . ' from ' . $e->getStartLine() . ':' . $e->getStartColumn($code)
. ' to ' . $e->getEndLine() . ':' . $e->getEndColumn($code);
// or:
echo $e->getMessageWithColumnInfo();
} else {
echo $e->getMessage();
}
```
Both line numbers and column numbers are 1-based. EOF errors will be located at the position one past the end of the
file.
Error recovery
--------------
The error behavior of the parser (and other components) is controlled by an `ErrorHandler`. Whenever an error is
encountered, `ErrorHandler::handleError()` is invoked. The default error handling strategy is `ErrorHandler\Throwing`,
which will immediately throw when an error is encountered.
To instead collect all encountered errors into an array, while trying to continue parsing the rest of the source code,
an instance of `ErrorHandler\Collecting` can be passed to the `Parser::parse()` method. A usage example:
```php
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7);
$errorHandler = new PhpParser\ErrorHandler\Collecting;
$stmts = $parser->parse($code, $errorHandler);
if ($errorHandler->hasErrors()) {
foreach ($errorHandler->getErrors() as $error) {
// $error is an ordinary PhpParser\Error
}
}
if (null !== $stmts) {
// $stmts is a best-effort partial AST
}
```
The `NameResolver` visitor also accepts an `ErrorHandler` as a constructor argument.

View file

@ -0,0 +1,68 @@
Frequently Asked Questions
==========================
* [How can the parent of a node be obtained?](#how-can-the-parent-of-a-node-be-obtained)
* [How can the next/previous sibling of a node be obtained?](#how-can-the-nextprevious-sibling-of-a-node-be-obtained)
How can the parent of a node be obtained?
-----
The AST does not store parent nodes by default. However, it is easy to add a custom parent node
attribute using a custom node visitor:
```php
use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
class ParentConnector extends NodeVisitorAbstract {
private $stack;
public function beforeTraverse(array $nodes) {
$this->stack = [];
}
public function enterNode(Node $node) {
if (!empty($this->stack)) {
$node->setAttribute('parent', $this->stack[count($this->stack)-1]);
}
$this->stack[] = $node;
}
public function leaveNode(Node $node) {
array_pop($this->stack);
}
}
```
After running this visitor, the parent node can be obtained through `$node->getAttribute('parent')`.
How can the next/previous sibling of a node be obtained?
-----
Again, siblings are not stored by default, but the visitor from the previous entry can be easily
extended to store the previous / next node with a common parent as well:
```php
use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
class NodeConnector extends NodeVisitorAbstract {
private $stack;
private $prev;
public function beforeTraverse(array $nodes) {
$this->stack = [];
$this->prev = null;
}
public function enterNode(Node $node) {
if (!empty($this->stack)) {
$node->setAttribute('parent', $this->stack[count($this->stack)-1]);
}
if ($this->prev && $this->prev->getAttribute('parent') == $node->getAttribute('parent')) {
$node->setAttribute('prev', $this->prev);
$this->prev->setAttribute('next', $node);
}
$this->stack[] = $node;
}
public function leaveNode(Node $node) {
$this->prev = $node;
array_pop($this->stack);
}
}
```

View file

@ -0,0 +1,131 @@
JSON representation
===================
Nodes (and comments) implement the `JsonSerializable` interface. As such, it is possible to JSON
encode the AST directly using `json_encode()`:
```php
<?php
use PhpParser\ParserFactory;
$code = <<<'CODE'
<?php
/** @param string $msg */
function printLine($msg) {
echo $msg, "\n";
}
CODE;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
try {
$stmts = $parser->parse($code);
echo json_encode($stmts, JSON_PRETTY_PRINT), "\n";
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
This will result in the following output (which includes attributes):
```json
[
{
"nodeType": "Stmt_Function",
"byRef": false,
"name": {
"nodeType": "Identifier",
"name": "printLine",
"attributes": {
"startLine": 4,
"endLine": 4
}
},
"params": [
{
"nodeType": "Param",
"type": null,
"byRef": false,
"variadic": false,
"var": {
"nodeType": "Expr_Variable",
"name": "msg",
"attributes": {
"startLine": 4,
"endLine": 4
}
},
"default": null,
"attributes": {
"startLine": 4,
"endLine": 4
}
}
],
"returnType": null,
"stmts": [
{
"nodeType": "Stmt_Echo",
"exprs": [
{
"nodeType": "Expr_Variable",
"name": "msg",
"attributes": {
"startLine": 5,
"endLine": 5
}
},
{
"nodeType": "Scalar_String",
"value": "\n",
"attributes": {
"startLine": 5,
"endLine": 5,
"kind": 2
}
}
],
"attributes": {
"startLine": 5,
"endLine": 5
}
}
],
"attributes": {
"startLine": 4,
"comments": [
{
"nodeType": "Comment_Doc",
"text": "\/** @param string $msg *\/",
"line": 3,
"filePos": 9,
"tokenPos": 2
}
],
"endLine": 6
}
}
]
```
The JSON representation may be converted back into an AST using the `JsonDecoder`:
```php
<?php
$jsonDecoder = new PhpParser\JsonDecoder();
$ast = $jsonDecoder->decode($json);
```
Note that not all ASTs can be represented using JSON. In particular:
* JSON only supports UTF-8 strings.
* JSON does not support non-finite floating-point numbers. This can occur if the original source
code contains non-representable floating-pointing literals such as `1e1000`.
If the node tree is not representable in JSON, the initial `json_encode()` call will fail.
From the command line, a JSON dump can be obtained using `vendor/bin/php-parse -j file.php`.

View file

@ -0,0 +1,159 @@
Lexer component documentation
=============================
The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and
`PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of
newer PHP versions and thus allows parsing of new code on older versions.
This documentation discusses options available for the default lexers and explains how lexers can be extended.
Lexer options
-------------
The two default lexers accept an `$options` array in the constructor. Currently only the `'usedAttributes'` option is
supported, which allows you to specify which attributes will be added to the AST nodes. The attributes can then be
accessed using `$node->getAttribute()`, `$node->setAttribute()`, `$node->hasAttribute()` and `$node->getAttributes()`
methods. A sample options array:
```php
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array(
'comments', 'startLine', 'endLine'
)
));
```
The attributes used in this example match the default behavior of the lexer. The following attributes are supported:
* `comments`: Array of `PhpParser\Comment` or `PhpParser\Comment\Doc` instances, representing all comments that occurred
between the previous non-discarded token and the current one. Use of this attribute is required for the
`$node->getComments()` and `$node->getDocComment()` methods to work. The attribute is also needed if you wish the pretty
printer to retain comments present in the original code.
* `startLine`: Line in which the node starts. This attribute is required for the `$node->getLine()` to work. It is also
required if syntax errors should contain line number information.
* `endLine`: Line in which the node ends. Required for `$node->getEndLine()`.
* `startTokenPos`: Offset into the token array of the first token in the node. Required for `$node->getStartTokenPos()`.
* `endTokenPos`: Offset into the token array of the last token in the node. Required for `$node->getEndTokenPos()`.
* `startFilePos`: Offset into the code string of the first character that is part of the node. Required for `$node->getStartFilePos()`.
* `endFilePos`: Offset into the code string of the last character that is part of the node. Required for `$node->getEndFilePos()`.
### Using token positions
> **Note:** The example in this section is outdated in that this information is directly available in the AST: While
> `$property->isPublic()` does not distinguish between `public` and `var`, directly checking `$property->flags` for
> the `$property->flags & Class_::VISIBILITY_MODIFIER_MASK) === 0` allows making this distinction without resorting to
> tokens. However the general idea behind the example still applies in other cases.
The token offset information is useful if you wish to examine the exact formatting used for a node. For example the AST
does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this
information based on the token position:
```php
function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) {
$i = $prop->getAttribute('startTokenPos');
return $tokens[$i][0] === T_VAR;
}
```
In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using
code similar to the following:
```php
class MyNodeVisitor extends PhpParser\NodeVisitorAbstract {
private $tokens;
public function setTokens(array $tokens) {
$this->tokens = $tokens;
}
public function leaveNode(PhpParser\Node $node) {
if ($node instanceof PhpParser\Node\Stmt\Property) {
var_dump(isDeclaredUsingVar($this->tokens, $node));
}
}
}
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array(
'comments', 'startLine', 'endLine', 'startTokenPos', 'endTokenPos'
)
));
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7, $lexer);
$visitor = new MyNodeVisitor();
$traverser = new PhpParser\NodeTraverser();
$traverser->addVisitor($visitor);
try {
$stmts = $parser->parse($code);
$visitor->setTokens($lexer->getTokens());
$stmts = $traverser->traverse($stmts);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The same approach can also be used to perform specific modifications in the code, without changing the formatting in
other places (which is the case when using the pretty printer).
Lexer extension
---------------
A lexer has to define the following public interface:
```php
function startLexing(string $code, ErrorHandler $errorHandler = null): void;
function getTokens(): array;
function handleHaltCompiler(): string;
function getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null): int;
```
The `startLexing()` method is invoked whenever the `parse()` method of the parser is called and is passed the source
code that is to be lexed (including the opening tag). It can be used to reset state or preprocess the source code or tokens. The
passed `ErrorHandler` should be used to report lexing errors.
The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not
used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes.
The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the
remaining string after the construct (not including `();`).
The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more
tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the
token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser).
### Attribute handling
The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be
assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the
node and the `$endAttributes` from the last token that is part of the node.
E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the
`T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token.
An application of custom attributes is storing the exact original formatting of literals: While the parser does retain
some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it
does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This
can be remedied by storing the original value in an attribute:
```php
use PhpParser\Lexer;
use PhpParser\Parser\Tokens;
class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative
{
public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
$tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string
|| $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string
|| $tokenId == Tokens::T_LNUMBER // integer
|| $tokenId == Tokens::T_DNUMBER // floating point number
) {
// could also use $startAttributes, doesn't really matter here
$endAttributes['originalValue'] = $value;
}
return $tokenId;
}
}
```

View file

@ -0,0 +1,87 @@
Name resolution
===============
Since the introduction of namespaces in PHP 5.3, literal names in PHP code are subject to a
relatively complex name resolution process, which is based on the current namespace, the current
import table state, as well the type of the referenced symbol. PHP-Parser implements name
resolution and related functionality, both as reusable logic (NameContext), as well as a node
visitor (NameResolver) based on it.
The NameResolver visitor
------------------------
The `NameResolver` visitor can (and for nearly all uses of the AST, is) be applied to resolve names
to their fully-qualified form, to the degree that this is possible.
```php
$nameResolver = new PhpParser\NodeVisitor\NameResolver;
$nodeTraverser = new PhpParser\NodeTraverser;
$nodeTraverser->addVisitor($nameResolver);
// Resolve names
$stmts = $nodeTraverser->traverse($stmts);
```
In the default configuration, the name resolver will perform three actions:
* Declarations of functions, classes, interfaces, traits and global constants will have a
`namespacedName` property added, which contains the function/class/etc name including the
namespace prefix. For historic reasons this is a **property** rather than an attribute.
* Names will be replaced by fully qualified resolved names, which are instances of
`Node\Name\FullyQualified`.
* Unqualified function and constant names inside a namespace cannot be statically resolved. Inside
a namespace `Foo`, a call to `strlen()` may either refer to the namespaced `\Foo\strlen()`, or
the global `\strlen()`. Because PHP-Parser does not have the necessary context to decide this,
such names are left unresolved. Additionally a `namespacedName` **attribute** is added to the
name node.
The name resolver accepts an option array as the second argument, with the following default values:
```php
$nameResolver = new PhpParser\NodeVisitor\NameResolver(null, [
'preserveOriginalNames' => false,
'replaceNodes' => true,
]);
```
If the `preserveOriginalNames` option is enabled, then the resolved (fully qualified) name will have
an `originalName` attribute, which contains the unresolved name.
If the `replaceNodes` option is disabled, then names will no longer be resolved in-place. Instead a
`resolvedName` attribute will be added to each name, which contains the resolved (fully qualified)
name. Once again, if an unqualified function or constant name cannot be resolved, then the
`resolvedName` attribute will not be present, and instead a `namespacedName` attribute is added.
The `replaceNodes` attribute is useful if you wish to perform modifications on the AST, as you
probably do not wish the resoluting code to have fully resolved names as a side-effect.
The NameContext
---------------
The actual name resolution logic is implemented in the `NameContext` class, which has the following
public API:
```php
class NameContext {
public function __construct(ErrorHandler $errorHandler);
public function startNamespace(Name $namespace = null);
public function addAlias(Name $name, string $aliasName, int $type, array $errorAttrs = []);
public function getNamespace();
public function getResolvedName(Name $name, int $type);
public function getResolvedClassName(Name $name) : Name;
public function getPossibleNames(string $name, int $type) : array;
public function getShortName(string $name, int $type) : Name;
}
```
The `$type` parameters accept on of the `Stmt\Use_::TYPE_*` constants, which represent the three
basic symbol types in PHP (functions, constants and everything else).
Next to name resolution, the `NameContext` also supports the reverse operation of finding a short
representation of a name given the current name resolution environment.
The name context is intended to be used for name resolution operations outside the AST itself, such
as class names inside doc comments. A visitor running in parallel with the name resolver can access
the name context using `$nameResolver->getNameContext()`. Alternatively a visitor can use an
independent context and explicitly feed `Namespace` and `Use` nodes to it.

View file

@ -0,0 +1,65 @@
Performance
===========
Parsing is computationally expensive task, to which the PHP language is not very well suited.
Nonetheless, there are a few things you can do to improve the performance of this library, which are
described in the following.
Xdebug
------
Running PHP with XDebug adds a lot of overhead, especially for code that performs many method calls.
Just by loading XDebug (without enabling profiling or other more intrusive XDebug features), you
can expect that code using PHP-Parser will be approximately *five times slower*.
As such, you should make sure that XDebug is not loaded when using this library. Note that setting
the `xdebug.default_enable=0` ini option does *not* disable XDebug. The *only* way to disable
XDebug is to not load the extension in the first place.
If you are building a command-line utility for use by developers (who often have XDebug enabled),
you may want to consider automatically restarting PHP with XDebug unloaded. The
[composer/xdebug-handler](https://github.com/composer/xdebug-handler) package can be used to do
this.
If you do run with XDebug, you may need to increase the `xdebug.max_nesting_level` option to a
higher level, such as 3000. While the parser itself is recursion free, most other code working on
the AST uses recursion and will generate an error if the value of this option is too low.
Assertions
----------
Assertions should be disabled in a production context by setting `zend.assertions=-1` (or
`zend.assertions=0` if set at runtime). The library currently doesn't make heavy use of assertions,
but they are used in an increasing number of places.
Object reuse
------------
Many objects in this project are designed for reuse. For example, one `Parser` object can be used to
parse multiple files.
When possible, objects should be reused rather than being newly instantiated for every use. Some
objects have expensive initialization procedures, which will be unnecessarily repeated if the object
is not reused. (Currently two objects with particularly expensive setup are lexers and pretty
printers, though the details might change between versions of this library.)
Garbage collection
------------------
A limitation in PHP's cyclic garbage collector may lead to major performance degradation when the
active working set exceeds 10000 objects (or arrays). Especially when parsing very large files this
limit is significantly exceeded and PHP will spend the majority of time performing unnecessary
garbage collection attempts.
Without GC, parsing time is roughly linear in the input size. With GC, this degenerates to quadratic
runtime for large files. While the specifics may differ, as a rough guideline you may expect a 2.5x
GC overhead for 500KB files and a 5x overhead for 1MB files.
Because this a limitation in PHP's implementation, there is no easy way to work around this. If
possible, you should avoid parsing very large files, as they will impact overall execution time
disproportionally (and are usually generated anyway).
Of course, you can also try to (temporarily) disable GC. By design the AST generated by PHP-Parser
is cycle-free, so the AST itself will never cause leaks with GC disabled. However, other code
(including for example the parser object itself) may hold cycles, so disabling of GC should be
approached with care.

View file

@ -0,0 +1,96 @@
Pretty printing
===============
Pretty printing is the process of converting a syntax tree back to PHP code. In its basic mode of
operation the pretty printer provided by this library will print the AST using a certain predefined
code style and will discard (nearly) all formatting of the original code. Because programmers tend
to be rather picky about their code formatting, this mode of operation is not very suitable for
refactoring code, but can be used for automatically generated code, which is usually only read for
debugging purposes.
Basic usage
-----------
```php
$stmts = $parser->parse($code);
// MODIFY $stmts here
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
$newCode = $prettyPrinter->prettyPrintFile($stmts);
```
The pretty printer has three basic printing methods: `prettyPrint()`, `prettyPrintFile()` and
`prettyPrintExpr()`. The one that is most commonly useful is `prettyPrintFile()`, which takes an
array of statements and produces a full PHP file, including opening `<?php`.
`prettyPrint()` also takes a statement array, but produces code which is valid inside an already
open `<?php` context. Lastly, `prettyPrintExpr()` takes an `Expr` node and prints only a single
expression.
Customizing the formatting
--------------------------
Apart from an `shortArraySyntax` option, the default pretty printer does not provide any
functionality to customize the formatting of the generated code. The pretty printer does respect a
number of `kind` attributes used by some notes (e.g., whether an integer should be printed as
decimal, hexadecimal, etc), but there are no options to control brace placement or similar.
If you want to make minor changes to the formatting, the easiest way is to extend the pretty printer
and override the methods responsible for the node types you are interested in.
If you want to have more fine-grained formatting control, the recommended method is to combine the
default pretty printer with an existing library for code reformatting, such as
[PHP-CS-Fixer](https://github.com/FriendsOfPHP/PHP-CS-Fixer).
Formatting-preserving pretty printing
-------------------------------------
> **Note:** This functionality is **experimental** and not yet complete.
For automated code refactoring, migration and similar, you will usually only want to modify a small
portion of the code and leave the remainder alone. The basic pretty printer is not suitable for
this, because it will also reformat parts of the code which have not been modified.
Since PHP-Parser 4.0, an experimental formatting-preserving pretty-printing mode is available, which
attempts to preserve the formatting of code (those AST nodes that have not changed) and only reformat
code which has been modified or newly inserted.
Use of the formatting-preservation functionality requires some additional preparatory steps:
```php
use PhpParser\{Lexer, NodeTraverser, NodeVisitor, Parser, PrettyPrinter};
$lexer = new Lexer\Emulative([
'usedAttributes' => [
'comments',
'startLine', 'endLine',
'startTokenPos', 'endTokenPos',
],
]);
$parser = new Parser\Php7($lexer);
$traverser = new NodeTraverser();
$traverser->addVisitor(new NodeVisitor\CloningVisitor());
$printer = new PrettyPrinter\Standard();
$oldStmts = $parser->parse($code);
$oldTokens = $lexer->getTokens();
$newStmts = $traverser->traverse($oldStmts);
// MODIFY $newStmts HERE
$newCode = $printer->printFormatPreserving($newStmts, $oldStmts, $oldTokens);
```
If you make use of the name resolution functionality, you will likely want to disable the
`replaceNodes` option. This will add resolved names as attributes, instead of directlying modifying
the AST and causing spurious changes to the pretty printed code. For more information, see the
[name resolution documentation](Name_resolution.markdown).
This functionality is experimental and not yet fully implemented. It should not provide incorrect
code, but it may sometimes reformat more code than necessary. Open issues are tracked in
[issue #344](https://github.com/nikic/PHP-Parser/issues/344). If you encounter problems while using
this functionality, please open an issue, so we know what to prioritize.

View file

@ -0,0 +1,337 @@
Walking the AST
===============
The most common way to work with the AST is by using a node traverser and one or more node visitors.
As a basic example, the following code changes all literal integers in the AST into strings (e.g.,
`42` becomes `'42'`.)
```php
use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract};
$traverser = new NodeTraverser;
$traverser->addVisitor(new class extends NodeVisitorAbstract {
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\LNumber) {
return new Node\Scalar\String_((string) $node->value);
}
}
});
$stmts = ...;
$modifiedStmts = $traverser->traverse($stmts);
```
Node visitors
-------------
Each node visitor implements an interface with following four methods:
```php
interface NodeVisitor {
public function beforeTraverse(array $nodes);
public function enterNode(Node $node);
public function leaveNode(Node $node);
public function afterTraverse(array $nodes);
}
```
The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal
respectively, and are passed the entire AST. They can be used to perform any necessary state
setup or cleanup.
The `enterNode()` method is called when a node is first encountered, before its children are
processed ("preorder"). The `leaveNode()` method is called after all children have been visited
("postorder").
For example, if we have the following excerpt of an AST
```
Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hello World!!!
)
byRef: false
unpack: false
)
)
)
```
then the enter/leave methods will be called in the following order:
```
enterNode(Expr_FuncCall)
enterNode(Name)
leaveNode(Name)
enterNode(Arg)
enterNode(Scalar_String)
leaveNode(Scalar_String)
leaveNode(Arg)
leaveNode(Expr_FuncCall)
```
A common pattern is that `enterNode` is used to collect some information and then `leaveNode`
performs modifications based on that. At the time when `leaveNode` is called, all the code inside
the node will have already been visited and necessary information collected.
As you usually do not want to implement all four methods, it is recommended that you extend
`NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides
empty default implementations.
Modifying the AST
-----------------
There are a number of ways in which the AST can be modified from inside a node visitor. The first
and simplest is to simply change AST properties inside the visitor:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\LNumber) {
// increment all integer literals
$node->value++;
}
}
```
The second is to replace a node entirely by returning a new node:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
// Convert all $a && $b expressions into !($a && $b)
return new Node\Expr\BooleanNot($node);
}
}
```
Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about
where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal
will also consider the children of the new node. If you aren't careful, this can lead to infinite
recursion. For example, let's take the previous code sample and use enterNode instead:
```php
public function enterNode(Node $node) {
if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
// Convert all $a && $b expressions into !($a && $b)
return new Node\Expr\BooleanNot($node);
}
}
```
Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and
only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up
with `!!($a && $b)`. This will continue until PHP hits the memory limit.
Finally, two special replacement types are supported only by leaveNode. The first is removal of a
node:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Return_) {
// Remove all return statements
return NodeTraverser::REMOVE_NODE;
}
}
```
Node removal only works if the parent structure is an array. This means that usually it only makes
sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few
more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists).
On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no
meaningful way in which the `$a` part could be removed. If you want to remove an expression, you
generally want to remove it together with a surrounding expression statement:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Expression
&& $node->expr instanceof Node\Expr\FuncCall
&& $node->expr->name instanceof Node\Name
&& $node->expr->name->toString() === 'var_dump'
) {
return NodeTraverser::REMOVE_NODE;
}
}
```
This example will remove all calls to `var_dump()` which occur as expression statements. This means
that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no
obvious way in which it can be removed).
Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this
only works inside leaveNode and only if the parent structure is an array.
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
// Convert "return foo();" into "$retval = foo(); return $retval;"
$var = new Node\Expr\Variable('retval');
return [
new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
new Node\Stmt\Return_($var),
];
}
}
```
Short-circuiting traversal
--------------------------
An AST can easily contain thousands of nodes, and traversing over all of them may be slow,
especially if you have more than one visitor. In some cases, it is possible to avoid a full
traversal.
If you are looking for all class declarations in a file (and assuming you're not interested in
anonymous classes), you know that once you've seen a class declaration, there is no point in also
checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can
instruct the traverser to not recurse into the class node:
```
private $classes = [];
public function enterNode(Node $node) {
if ($node instanceof Node\Stmt\Class_) {
$this->classes[] = $node;
return NodeTraverser::DONT_TRAVERSE_CHILDREN;
}
}
```
Of course, this option is only available in enterNode, because it's already too late by the time
leaveNode is reached.
If you are only looking for one specific node, it is also possible to abort the traversal entirely
after finding it. For example, if you are looking for the node of a class with a certain name (and
discounting exotic cases like conditionally defining a class two times), you can stop traversal
once you found it:
```
private $class = null;
public function enterNode(Node $node) {
if ($node instanceof Node\Stmt\Class_ &&
$node->namespacedName->toString() === 'Foo\Bar\Baz'
) {
$this->class = $node;
return NodeTraverser::STOP_TRAVERSAL;
}
}
```
This works both in enterNode and leaveNode. Note that this particular case can also be more easily
handled using a NodeFinder, which will be introduced below.
Multiple visitors
-----------------
A single traverser can be used with multiple visitors:
```php
$traverser = new NodeTraverser;
$traverser->addVisitor($visitorA);
$traverser->addVisitor($visitorB);
$stmts = $traverser->traverse($stmts);
```
It is important to understand that if a traverser is run with multiple visitors, the visitors will
be interleaved. Given the following AST excerpt
```
Stmt_Return(
expr: Expr_Variable(
name: foobar
)
)
```
the following method calls will be performed:
```
$visitorA->enterNode(Stmt_Return)
$visitorB->enterNode(Stmt_Return)
$visitorA->enterNode(Expr_Variable)
$visitorB->enterNode(Expr_Variable)
$visitorA->leaveNode(Expr_Variable)
$visitorB->leaveNode(Expr_Variable)
$visitorA->leaveNode(Stmt_Return)
$visitorB->leaveNode(Stmt_Return)
```
That is, when visiting a node, enterNode and leaveNode will always be called for all visitors.
Running multiple visitors in parallel improves performance, as the AST only has to be traversed
once. However, it is not always possible to write visitors in a way that allows interleaved
execution. In this case, you can always fall back to performing multiple traversals:
```php
$traverserA = new NodeTraverser;
$traverserA->addVisitor($visitorA);
$traverserB = new NodeTraverser;
$traverserB->addVisitor($visitorB);
$stmts = $traverserA->traverser($stmts);
$stmts = $traverserB->traverser($stmts);
```
When using multiple visitors, it is important to understand how they interact with the various
special enterNode/leaveNode return values:
* If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all*
visitors.
* If *any* visitor returns `DONT_TRAVERSE_CURRENT_AND_CHILDREN`, the children will be skipped for *all*
visitors, and all *subsequent* visitors will not visit the current node.
* If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors.
* If a visitor returns a replacement node, subsequent visitors will be passed the replacement node,
not the original one.
* If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node.
* If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node
that was replaced, nor the replacement nodes.
Simple node finding
-------------------
While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome
for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either
satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are
shown in the following:
```php
use PhpParser\{Node, NodeFinder};
$nodeFinder = new NodeFinder;
// Find all class nodes.
$classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
// Find all classes that extend another class
$extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
return $node instanceof Node\Stmt\Class_
&& $node->extends !== null;
});
// Find first class occuring in the AST. Returns null if no class exists.
$class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
// Find first class that has name $name
$class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
return $node instanceof Node\Stmt\Class_
&& $node->resolvedName->toString() === $name;
});
```
Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a
common use case.
Parent and sibling references
-----------------------------
The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should
be accessed: From parents to children. However, it can often be convenient to operate in the
reverse direction: When working on a node, you might want to check if the parent node satisfies a
certain property.
PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily
emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.