Add Composer autoloading functions and PHPStan for testing

This commit is contained in:
Alex Cabal 2019-02-26 13:03:45 -06:00
parent e198c4db65
commit f5d7d4e02a
1518 changed files with 169063 additions and 30 deletions

View file

@ -0,0 +1,138 @@
AST builders
============
When PHP-Parser is used to generate (or modify) code by first creating an Abstract Syntax Tree and
then using the [pretty printer](Pretty_printing.markdown) to convert it to PHP code, it can often
be tedious to manually construct AST nodes. The project provides a number of utilities to simplify
the construction of common AST nodes.
Fluent builders
---------------
The library comes with a number of builders, which allow creating node trees using a fluent
interface. Builders are created using the `BuilderFactory` and the final constructed node is
accessed through `getNode()`. Fluent builders are available for
the following syntactic elements:
* namespaces and use statements
* classes, interfaces and traits
* methods, functions and parameters
* properties
Here is an example:
```php
use PhpParser\BuilderFactory;
use PhpParser\PrettyPrinter;
use PhpParser\Node;
$factory = new BuilderFactory;
$node = $factory->namespace('Name\Space')
->addStmt($factory->use('Some\Other\Thingy')->as('SomeClass'))
->addStmt($factory->useFunction('strlen'))
->addStmt($factory->useConst('PHP_VERSION'))
->addStmt($factory->class('SomeOtherClass')
->extend('SomeClass')
->implement('A\Few', '\Interfaces')
->makeAbstract() // ->makeFinal()
->addStmt($factory->useTrait('FirstTrait'))
->addStmt($factory->useTrait('SecondTrait', 'ThirdTrait')
->and('AnotherTrait')
->with($factory->traitUseAdaptation('foo')->as('bar'))
->with($factory->traitUseAdaptation('AnotherTrait', 'baz')->as('test'))
->with($factory->traitUseAdaptation('AnotherTrait', 'func')->insteadof('SecondTrait')))
->addStmt($factory->method('someMethod')
->makePublic()
->makeAbstract() // ->makeFinal()
->setReturnType('bool') // ->makeReturnByRef()
->addParam($factory->param('someParam')->setType('SomeClass'))
->setDocComment('/**
* This method does something.
*
* @param SomeClass And takes a parameter
*/')
)
->addStmt($factory->method('anotherMethod')
->makeProtected() // ->makePublic() [default], ->makePrivate()
->addParam($factory->param('someParam')->setDefault('test'))
// it is possible to add manually created nodes
->addStmt(new Node\Expr\Print_(new Node\Expr\Variable('someParam')))
)
// properties will be correctly reordered above the methods
->addStmt($factory->property('someProperty')->makeProtected())
->addStmt($factory->property('anotherProperty')->makePrivate()->setDefault(array(1, 2, 3)))
)
->getNode()
;
$stmts = array($node);
$prettyPrinter = new PrettyPrinter\Standard();
echo $prettyPrinter->prettyPrintFile($stmts);
```
This will produce the following output with the standard pretty printer:
```php
<?php
namespace Name\Space;
use Some\Other\Thingy as SomeClass;
use function strlen;
use const PHP_VERSION;
abstract class SomeOtherClass extends SomeClass implements A\Few, \Interfaces
{
use FirstTrait;
use SecondTrait, ThirdTrait, AnotherTrait {
foo as bar;
AnotherTrait::baz as test;
AnotherTrait::func insteadof SecondTrait;
}
protected $someProperty;
private $anotherProperty = array(1, 2, 3);
/**
* This method does something.
*
* @param SomeClass And takes a parameter
*/
public abstract function someMethod(SomeClass $someParam) : bool;
protected function anotherMethod($someParam = 'test')
{
print $someParam;
}
}
```
Additional helper methods
-------------------------
The `BuilderFactory` also provides a number of additional helper methods, which directly return
nodes. The following methods are currently available:
* `val($value)`: Creates an AST node for a literal value like `42` or `[1, 2, 3]`.
* `var($name)`: Creates variable node.
* `args(array $args)`: Creates an array of function/method arguments, including the required `Arg`
wrappers. Also converts literals to AST nodes.
* `funcCall($name, array $args = [])`: Create a function call node. Converts `$name` to a `Name`
node and normalizes arguments.
* `methodCall(Expr $var, $name, array $args = [])`: Create a method call node. Converts `$name` to
an `Identifier` node and normalizes arguments.
* `staticCall($class, $name, array $args = [])`: Create a static method call node. Converts
`$class` to a `Name` node, `$name` to an `Identifier` node and normalizes arguments.
* `new($class, array $args = [])`: Create a "new" (object creation) node. Converts `$class` to a
`Name` node.
* `constFetch($name)`: Create a constant fetch node. Converts `$name` to a `Name` node.
* `classConstFetch($class, $name)`: Create a class constant fetch node. Converts `$class` to a
`Name` node and `$name` to an `Identifier` node.
* `propertyFetch($var, $name)`: Creates a property fetch node. Converts `$name` to an `Identifier`
node.
* `concat(...$exprs)`: Create a tree of `BinaryOp\Concat` nodes for the given expressions.
These methods may be expanded on an as-needed basis. Please open an issue or PR if a common
operation is missing.

View file

@ -0,0 +1,115 @@
Constant expression evaluation
==============================
Initializers for constants, properties, parameters, etc. have limited support for expressions. For
example:
```php
<?php
class Test {
const SECONDS_IN_HOUR = 60 * 60;
const SECONDS_IN_DAY = 24 * self::SECONDS_IN_HOUR;
}
```
PHP-Parser supports evaluation of such constant expressions through the `ConstExprEvaluator` class:
```php
<?php
use PhpParser\{ConstExprEvaluator, ConstExprEvaluationException};
$evalutator = new ConstExprEvaluator();
try {
$value = $evalutator->evaluateSilently($someExpr);
} catch (ConstExprEvaluationException $e) {
// Either the expression contains unsupported expression types,
// or an error occurred during evaluation
}
```
Error handling
--------------
The constant evaluator provides two methods, `evaluateDirectly()` and `evaluateSilently()`, which
differ in error behavior. `evaluateDirectly()` will evaluate the expression as PHP would, including
any generated warnings or Errors. `evaluateSilently()` will instead convert warnings and Errors into
a `ConstExprEvaluationException`. For example:
```php
<?php
use PhpParser\{ConstExprEvaluator, ConstExprEvaluationException};
use PhpParser\Node\{Expr, Scalar};
$evaluator = new ConstExprEvaluator();
// 10 / 0
$expr = new Expr\BinaryOp\Div(new Scalar\LNumber(10), new Scalar\LNumber(0));
var_dump($evaluator->evaluateDirectly($expr)); // float(INF)
// Warning: Division by zero
try {
$evaluator->evaluateSilently($expr);
} catch (ConstExprEvaluationException $e) {
var_dump($e->getPrevious()->getMessage()); // Division by zero
}
```
For the purposes of static analysis, you will likely want to use `evaluateSilently()` and leave
erroring expressions unevaluated.
Unsupported expressions and evaluator fallback
----------------------------------------------
The constant expression evaluator supports all expression types that are permitted in constant
expressions, apart from the following:
* `Scalar\MagicConst\*`
* `Expr\ConstFetch` (only null/false/true are handled)
* `Expr\ClassConstFetch`
Handling these expression types requires non-local information, such as which global constants are
defined. By default, the evaluator will throw a `ConstExprEvaluationException` when it encounters
an unsupported expression type.
It is possible to override this behavior and support resolution for these expression types by
specifying an evaluation fallback function:
```php
<?php
use PhpParser\{ConstExprEvaluator, ConstExprEvaluationException};
use PhpParser\Node\Expr;
$evalutator = new ConstExprEvaluator(function(Expr $expr) {
if ($expr instanceof Expr\ConstFetch) {
return fetchConstantSomehow($expr);
}
if ($expr instanceof Expr\ClassConstFetch) {
return fetchClassConstantSomehow($expr);
}
// etc.
throw new ConstExprEvaluationException(
"Expression of type {$expr->getType()} cannot be evaluated");
});
try {
$evalutator->evaluateSilently($someExpr);
} catch (ConstExprEvaluationException $e) {
// Handle exception
}
```
Implementers are advised to ensure that evaluation of indirect constant references cannot lead to
infinite recursion. For example, the following code could lead to infinite recursion if constant
lookup is implemented naively.
```php
<?php
class Test {
const A = self::B;
const B = self::A;
}
```

View file

@ -0,0 +1,75 @@
Error handling
==============
Errors during parsing or analysis are represented using the `PhpParser\Error` exception class. In addition to an error
message, an error can also store additional information about the location the error occurred at.
How much location information is available depends on the origin of the error and how many lexer attributes have been
enabled. At a minimum the start line of the error is usually available.
Column information
------------------
In order to receive information about not only the line, but also the column span an error occurred at, the file
position attributes in the lexer need to be enabled:
```php
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array('comments', 'startLine', 'endLine', 'startFilePos', 'endFilePos'),
));
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer);
try {
$stmts = $parser->parse($code);
// ...
} catch (PhpParser\Error $e) {
// ...
}
```
Before using column information, its availability needs to be checked with `$e->hasColumnInfo()`, as the precise
location of an error cannot always be determined. The methods for retrieving column information also have to be passed
the source code of the parsed file. An example for printing an error:
```php
if ($e->hasColumnInfo()) {
echo $e->getRawMessage() . ' from ' . $e->getStartLine() . ':' . $e->getStartColumn($code)
. ' to ' . $e->getEndLine() . ':' . $e->getEndColumn($code);
// or:
echo $e->getMessageWithColumnInfo();
} else {
echo $e->getMessage();
}
```
Both line numbers and column numbers are 1-based. EOF errors will be located at the position one past the end of the
file.
Error recovery
--------------
The error behavior of the parser (and other components) is controlled by an `ErrorHandler`. Whenever an error is
encountered, `ErrorHandler::handleError()` is invoked. The default error handling strategy is `ErrorHandler\Throwing`,
which will immediately throw when an error is encountered.
To instead collect all encountered errors into an array, while trying to continue parsing the rest of the source code,
an instance of `ErrorHandler\Collecting` can be passed to the `Parser::parse()` method. A usage example:
```php
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7);
$errorHandler = new PhpParser\ErrorHandler\Collecting;
$stmts = $parser->parse($code, $errorHandler);
if ($errorHandler->hasErrors()) {
foreach ($errorHandler->getErrors() as $error) {
// $error is an ordinary PhpParser\Error
}
}
if (null !== $stmts) {
// $stmts is a best-effort partial AST
}
```
The `NameResolver` visitor also accepts an `ErrorHandler` as a constructor argument.

View file

@ -0,0 +1,68 @@
Frequently Asked Questions
==========================
* [How can the parent of a node be obtained?](#how-can-the-parent-of-a-node-be-obtained)
* [How can the next/previous sibling of a node be obtained?](#how-can-the-nextprevious-sibling-of-a-node-be-obtained)
How can the parent of a node be obtained?
-----
The AST does not store parent nodes by default. However, it is easy to add a custom parent node
attribute using a custom node visitor:
```php
use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
class ParentConnector extends NodeVisitorAbstract {
private $stack;
public function beforeTraverse(array $nodes) {
$this->stack = [];
}
public function enterNode(Node $node) {
if (!empty($this->stack)) {
$node->setAttribute('parent', $this->stack[count($this->stack)-1]);
}
$this->stack[] = $node;
}
public function leaveNode(Node $node) {
array_pop($this->stack);
}
}
```
After running this visitor, the parent node can be obtained through `$node->getAttribute('parent')`.
How can the next/previous sibling of a node be obtained?
-----
Again, siblings are not stored by default, but the visitor from the previous entry can be easily
extended to store the previous / next node with a common parent as well:
```php
use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
class NodeConnector extends NodeVisitorAbstract {
private $stack;
private $prev;
public function beforeTraverse(array $nodes) {
$this->stack = [];
$this->prev = null;
}
public function enterNode(Node $node) {
if (!empty($this->stack)) {
$node->setAttribute('parent', $this->stack[count($this->stack)-1]);
}
if ($this->prev && $this->prev->getAttribute('parent') == $node->getAttribute('parent')) {
$node->setAttribute('prev', $this->prev);
$this->prev->setAttribute('next', $node);
}
$this->stack[] = $node;
}
public function leaveNode(Node $node) {
$this->prev = $node;
array_pop($this->stack);
}
}
```

View file

@ -0,0 +1,131 @@
JSON representation
===================
Nodes (and comments) implement the `JsonSerializable` interface. As such, it is possible to JSON
encode the AST directly using `json_encode()`:
```php
<?php
use PhpParser\ParserFactory;
$code = <<<'CODE'
<?php
/** @param string $msg */
function printLine($msg) {
echo $msg, "\n";
}
CODE;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
try {
$stmts = $parser->parse($code);
echo json_encode($stmts, JSON_PRETTY_PRINT), "\n";
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
This will result in the following output (which includes attributes):
```json
[
{
"nodeType": "Stmt_Function",
"byRef": false,
"name": {
"nodeType": "Identifier",
"name": "printLine",
"attributes": {
"startLine": 4,
"endLine": 4
}
},
"params": [
{
"nodeType": "Param",
"type": null,
"byRef": false,
"variadic": false,
"var": {
"nodeType": "Expr_Variable",
"name": "msg",
"attributes": {
"startLine": 4,
"endLine": 4
}
},
"default": null,
"attributes": {
"startLine": 4,
"endLine": 4
}
}
],
"returnType": null,
"stmts": [
{
"nodeType": "Stmt_Echo",
"exprs": [
{
"nodeType": "Expr_Variable",
"name": "msg",
"attributes": {
"startLine": 5,
"endLine": 5
}
},
{
"nodeType": "Scalar_String",
"value": "\n",
"attributes": {
"startLine": 5,
"endLine": 5,
"kind": 2
}
}
],
"attributes": {
"startLine": 5,
"endLine": 5
}
}
],
"attributes": {
"startLine": 4,
"comments": [
{
"nodeType": "Comment_Doc",
"text": "\/** @param string $msg *\/",
"line": 3,
"filePos": 9,
"tokenPos": 2
}
],
"endLine": 6
}
}
]
```
The JSON representation may be converted back into an AST using the `JsonDecoder`:
```php
<?php
$jsonDecoder = new PhpParser\JsonDecoder();
$ast = $jsonDecoder->decode($json);
```
Note that not all ASTs can be represented using JSON. In particular:
* JSON only supports UTF-8 strings.
* JSON does not support non-finite floating-point numbers. This can occur if the original source
code contains non-representable floating-pointing literals such as `1e1000`.
If the node tree is not representable in JSON, the initial `json_encode()` call will fail.
From the command line, a JSON dump can be obtained using `vendor/bin/php-parse -j file.php`.

View file

@ -0,0 +1,159 @@
Lexer component documentation
=============================
The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and
`PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of
newer PHP versions and thus allows parsing of new code on older versions.
This documentation discusses options available for the default lexers and explains how lexers can be extended.
Lexer options
-------------
The two default lexers accept an `$options` array in the constructor. Currently only the `'usedAttributes'` option is
supported, which allows you to specify which attributes will be added to the AST nodes. The attributes can then be
accessed using `$node->getAttribute()`, `$node->setAttribute()`, `$node->hasAttribute()` and `$node->getAttributes()`
methods. A sample options array:
```php
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array(
'comments', 'startLine', 'endLine'
)
));
```
The attributes used in this example match the default behavior of the lexer. The following attributes are supported:
* `comments`: Array of `PhpParser\Comment` or `PhpParser\Comment\Doc` instances, representing all comments that occurred
between the previous non-discarded token and the current one. Use of this attribute is required for the
`$node->getComments()` and `$node->getDocComment()` methods to work. The attribute is also needed if you wish the pretty
printer to retain comments present in the original code.
* `startLine`: Line in which the node starts. This attribute is required for the `$node->getLine()` to work. It is also
required if syntax errors should contain line number information.
* `endLine`: Line in which the node ends. Required for `$node->getEndLine()`.
* `startTokenPos`: Offset into the token array of the first token in the node. Required for `$node->getStartTokenPos()`.
* `endTokenPos`: Offset into the token array of the last token in the node. Required for `$node->getEndTokenPos()`.
* `startFilePos`: Offset into the code string of the first character that is part of the node. Required for `$node->getStartFilePos()`.
* `endFilePos`: Offset into the code string of the last character that is part of the node. Required for `$node->getEndFilePos()`.
### Using token positions
> **Note:** The example in this section is outdated in that this information is directly available in the AST: While
> `$property->isPublic()` does not distinguish between `public` and `var`, directly checking `$property->flags` for
> the `$property->flags & Class_::VISIBILITY_MODIFIER_MASK) === 0` allows making this distinction without resorting to
> tokens. However the general idea behind the example still applies in other cases.
The token offset information is useful if you wish to examine the exact formatting used for a node. For example the AST
does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this
information based on the token position:
```php
function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) {
$i = $prop->getAttribute('startTokenPos');
return $tokens[$i][0] === T_VAR;
}
```
In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using
code similar to the following:
```php
class MyNodeVisitor extends PhpParser\NodeVisitorAbstract {
private $tokens;
public function setTokens(array $tokens) {
$this->tokens = $tokens;
}
public function leaveNode(PhpParser\Node $node) {
if ($node instanceof PhpParser\Node\Stmt\Property) {
var_dump(isDeclaredUsingVar($this->tokens, $node));
}
}
}
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array(
'comments', 'startLine', 'endLine', 'startTokenPos', 'endTokenPos'
)
));
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7, $lexer);
$visitor = new MyNodeVisitor();
$traverser = new PhpParser\NodeTraverser();
$traverser->addVisitor($visitor);
try {
$stmts = $parser->parse($code);
$visitor->setTokens($lexer->getTokens());
$stmts = $traverser->traverse($stmts);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The same approach can also be used to perform specific modifications in the code, without changing the formatting in
other places (which is the case when using the pretty printer).
Lexer extension
---------------
A lexer has to define the following public interface:
```php
function startLexing(string $code, ErrorHandler $errorHandler = null): void;
function getTokens(): array;
function handleHaltCompiler(): string;
function getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null): int;
```
The `startLexing()` method is invoked whenever the `parse()` method of the parser is called and is passed the source
code that is to be lexed (including the opening tag). It can be used to reset state or preprocess the source code or tokens. The
passed `ErrorHandler` should be used to report lexing errors.
The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not
used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes.
The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the
remaining string after the construct (not including `();`).
The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more
tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the
token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser).
### Attribute handling
The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be
assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the
node and the `$endAttributes` from the last token that is part of the node.
E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the
`T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token.
An application of custom attributes is storing the exact original formatting of literals: While the parser does retain
some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it
does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This
can be remedied by storing the original value in an attribute:
```php
use PhpParser\Lexer;
use PhpParser\Parser\Tokens;
class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative
{
public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
$tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string
|| $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string
|| $tokenId == Tokens::T_LNUMBER // integer
|| $tokenId == Tokens::T_DNUMBER // floating point number
) {
// could also use $startAttributes, doesn't really matter here
$endAttributes['originalValue'] = $value;
}
return $tokenId;
}
}
```

View file

@ -0,0 +1,87 @@
Name resolution
===============
Since the introduction of namespaces in PHP 5.3, literal names in PHP code are subject to a
relatively complex name resolution process, which is based on the current namespace, the current
import table state, as well the type of the referenced symbol. PHP-Parser implements name
resolution and related functionality, both as reusable logic (NameContext), as well as a node
visitor (NameResolver) based on it.
The NameResolver visitor
------------------------
The `NameResolver` visitor can (and for nearly all uses of the AST, is) be applied to resolve names
to their fully-qualified form, to the degree that this is possible.
```php
$nameResolver = new PhpParser\NodeVisitor\NameResolver;
$nodeTraverser = new PhpParser\NodeTraverser;
$nodeTraverser->addVisitor($nameResolver);
// Resolve names
$stmts = $nodeTraverser->traverse($stmts);
```
In the default configuration, the name resolver will perform three actions:
* Declarations of functions, classes, interfaces, traits and global constants will have a
`namespacedName` property added, which contains the function/class/etc name including the
namespace prefix. For historic reasons this is a **property** rather than an attribute.
* Names will be replaced by fully qualified resolved names, which are instances of
`Node\Name\FullyQualified`.
* Unqualified function and constant names inside a namespace cannot be statically resolved. Inside
a namespace `Foo`, a call to `strlen()` may either refer to the namespaced `\Foo\strlen()`, or
the global `\strlen()`. Because PHP-Parser does not have the necessary context to decide this,
such names are left unresolved. Additionally a `namespacedName` **attribute** is added to the
name node.
The name resolver accepts an option array as the second argument, with the following default values:
```php
$nameResolver = new PhpParser\NodeVisitor\NameResolver(null, [
'preserveOriginalNames' => false,
'replaceNodes' => true,
]);
```
If the `preserveOriginalNames` option is enabled, then the resolved (fully qualified) name will have
an `originalName` attribute, which contains the unresolved name.
If the `replaceNodes` option is disabled, then names will no longer be resolved in-place. Instead a
`resolvedName` attribute will be added to each name, which contains the resolved (fully qualified)
name. Once again, if an unqualified function or constant name cannot be resolved, then the
`resolvedName` attribute will not be present, and instead a `namespacedName` attribute is added.
The `replaceNodes` attribute is useful if you wish to perform modifications on the AST, as you
probably do not wish the resoluting code to have fully resolved names as a side-effect.
The NameContext
---------------
The actual name resolution logic is implemented in the `NameContext` class, which has the following
public API:
```php
class NameContext {
public function __construct(ErrorHandler $errorHandler);
public function startNamespace(Name $namespace = null);
public function addAlias(Name $name, string $aliasName, int $type, array $errorAttrs = []);
public function getNamespace();
public function getResolvedName(Name $name, int $type);
public function getResolvedClassName(Name $name) : Name;
public function getPossibleNames(string $name, int $type) : array;
public function getShortName(string $name, int $type) : Name;
}
```
The `$type` parameters accept on of the `Stmt\Use_::TYPE_*` constants, which represent the three
basic symbol types in PHP (functions, constants and everything else).
Next to name resolution, the `NameContext` also supports the reverse operation of finding a short
representation of a name given the current name resolution environment.
The name context is intended to be used for name resolution operations outside the AST itself, such
as class names inside doc comments. A visitor running in parallel with the name resolver can access
the name context using `$nameResolver->getNameContext()`. Alternatively a visitor can use an
independent context and explicitly feed `Namespace` and `Use` nodes to it.

View file

@ -0,0 +1,65 @@
Performance
===========
Parsing is computationally expensive task, to which the PHP language is not very well suited.
Nonetheless, there are a few things you can do to improve the performance of this library, which are
described in the following.
Xdebug
------
Running PHP with XDebug adds a lot of overhead, especially for code that performs many method calls.
Just by loading XDebug (without enabling profiling or other more intrusive XDebug features), you
can expect that code using PHP-Parser will be approximately *five times slower*.
As such, you should make sure that XDebug is not loaded when using this library. Note that setting
the `xdebug.default_enable=0` ini option does *not* disable XDebug. The *only* way to disable
XDebug is to not load the extension in the first place.
If you are building a command-line utility for use by developers (who often have XDebug enabled),
you may want to consider automatically restarting PHP with XDebug unloaded. The
[composer/xdebug-handler](https://github.com/composer/xdebug-handler) package can be used to do
this.
If you do run with XDebug, you may need to increase the `xdebug.max_nesting_level` option to a
higher level, such as 3000. While the parser itself is recursion free, most other code working on
the AST uses recursion and will generate an error if the value of this option is too low.
Assertions
----------
Assertions should be disabled in a production context by setting `zend.assertions=-1` (or
`zend.assertions=0` if set at runtime). The library currently doesn't make heavy use of assertions,
but they are used in an increasing number of places.
Object reuse
------------
Many objects in this project are designed for reuse. For example, one `Parser` object can be used to
parse multiple files.
When possible, objects should be reused rather than being newly instantiated for every use. Some
objects have expensive initialization procedures, which will be unnecessarily repeated if the object
is not reused. (Currently two objects with particularly expensive setup are lexers and pretty
printers, though the details might change between versions of this library.)
Garbage collection
------------------
A limitation in PHP's cyclic garbage collector may lead to major performance degradation when the
active working set exceeds 10000 objects (or arrays). Especially when parsing very large files this
limit is significantly exceeded and PHP will spend the majority of time performing unnecessary
garbage collection attempts.
Without GC, parsing time is roughly linear in the input size. With GC, this degenerates to quadratic
runtime for large files. While the specifics may differ, as a rough guideline you may expect a 2.5x
GC overhead for 500KB files and a 5x overhead for 1MB files.
Because this a limitation in PHP's implementation, there is no easy way to work around this. If
possible, you should avoid parsing very large files, as they will impact overall execution time
disproportionally (and are usually generated anyway).
Of course, you can also try to (temporarily) disable GC. By design the AST generated by PHP-Parser
is cycle-free, so the AST itself will never cause leaks with GC disabled. However, other code
(including for example the parser object itself) may hold cycles, so disabling of GC should be
approached with care.

View file

@ -0,0 +1,96 @@
Pretty printing
===============
Pretty printing is the process of converting a syntax tree back to PHP code. In its basic mode of
operation the pretty printer provided by this library will print the AST using a certain predefined
code style and will discard (nearly) all formatting of the original code. Because programmers tend
to be rather picky about their code formatting, this mode of operation is not very suitable for
refactoring code, but can be used for automatically generated code, which is usually only read for
debugging purposes.
Basic usage
-----------
```php
$stmts = $parser->parse($code);
// MODIFY $stmts here
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
$newCode = $prettyPrinter->prettyPrintFile($stmts);
```
The pretty printer has three basic printing methods: `prettyPrint()`, `prettyPrintFile()` and
`prettyPrintExpr()`. The one that is most commonly useful is `prettyPrintFile()`, which takes an
array of statements and produces a full PHP file, including opening `<?php`.
`prettyPrint()` also takes a statement array, but produces code which is valid inside an already
open `<?php` context. Lastly, `prettyPrintExpr()` takes an `Expr` node and prints only a single
expression.
Customizing the formatting
--------------------------
Apart from an `shortArraySyntax` option, the default pretty printer does not provide any
functionality to customize the formatting of the generated code. The pretty printer does respect a
number of `kind` attributes used by some notes (e.g., whether an integer should be printed as
decimal, hexadecimal, etc), but there are no options to control brace placement or similar.
If you want to make minor changes to the formatting, the easiest way is to extend the pretty printer
and override the methods responsible for the node types you are interested in.
If you want to have more fine-grained formatting control, the recommended method is to combine the
default pretty printer with an existing library for code reformatting, such as
[PHP-CS-Fixer](https://github.com/FriendsOfPHP/PHP-CS-Fixer).
Formatting-preserving pretty printing
-------------------------------------
> **Note:** This functionality is **experimental** and not yet complete.
For automated code refactoring, migration and similar, you will usually only want to modify a small
portion of the code and leave the remainder alone. The basic pretty printer is not suitable for
this, because it will also reformat parts of the code which have not been modified.
Since PHP-Parser 4.0, an experimental formatting-preserving pretty-printing mode is available, which
attempts to preserve the formatting of code (those AST nodes that have not changed) and only reformat
code which has been modified or newly inserted.
Use of the formatting-preservation functionality requires some additional preparatory steps:
```php
use PhpParser\{Lexer, NodeTraverser, NodeVisitor, Parser, PrettyPrinter};
$lexer = new Lexer\Emulative([
'usedAttributes' => [
'comments',
'startLine', 'endLine',
'startTokenPos', 'endTokenPos',
],
]);
$parser = new Parser\Php7($lexer);
$traverser = new NodeTraverser();
$traverser->addVisitor(new NodeVisitor\CloningVisitor());
$printer = new PrettyPrinter\Standard();
$oldStmts = $parser->parse($code);
$oldTokens = $lexer->getTokens();
$newStmts = $traverser->traverse($oldStmts);
// MODIFY $newStmts HERE
$newCode = $printer->printFormatPreserving($newStmts, $oldStmts, $oldTokens);
```
If you make use of the name resolution functionality, you will likely want to disable the
`replaceNodes` option. This will add resolved names as attributes, instead of directlying modifying
the AST and causing spurious changes to the pretty printed code. For more information, see the
[name resolution documentation](Name_resolution.markdown).
This functionality is experimental and not yet fully implemented. It should not provide incorrect
code, but it may sometimes reformat more code than necessary. Open issues are tracked in
[issue #344](https://github.com/nikic/PHP-Parser/issues/344). If you encounter problems while using
this functionality, please open an issue, so we know what to prioritize.

View file

@ -0,0 +1,337 @@
Walking the AST
===============
The most common way to work with the AST is by using a node traverser and one or more node visitors.
As a basic example, the following code changes all literal integers in the AST into strings (e.g.,
`42` becomes `'42'`.)
```php
use PhpParser\{Node, NodeTraverser, NodeVisitorAbstract};
$traverser = new NodeTraverser;
$traverser->addVisitor(new class extends NodeVisitorAbstract {
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\LNumber) {
return new Node\Scalar\String_((string) $node->value);
}
}
});
$stmts = ...;
$modifiedStmts = $traverser->traverse($stmts);
```
Node visitors
-------------
Each node visitor implements an interface with following four methods:
```php
interface NodeVisitor {
public function beforeTraverse(array $nodes);
public function enterNode(Node $node);
public function leaveNode(Node $node);
public function afterTraverse(array $nodes);
}
```
The `beforeTraverse()` and `afterTraverse()` methods are called before and after the traversal
respectively, and are passed the entire AST. They can be used to perform any necessary state
setup or cleanup.
The `enterNode()` method is called when a node is first encountered, before its children are
processed ("preorder"). The `leaveNode()` method is called after all children have been visited
("postorder").
For example, if we have the following excerpt of an AST
```
Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hello World!!!
)
byRef: false
unpack: false
)
)
)
```
then the enter/leave methods will be called in the following order:
```
enterNode(Expr_FuncCall)
enterNode(Name)
leaveNode(Name)
enterNode(Arg)
enterNode(Scalar_String)
leaveNode(Scalar_String)
leaveNode(Arg)
leaveNode(Expr_FuncCall)
```
A common pattern is that `enterNode` is used to collect some information and then `leaveNode`
performs modifications based on that. At the time when `leaveNode` is called, all the code inside
the node will have already been visited and necessary information collected.
As you usually do not want to implement all four methods, it is recommended that you extend
`NodeVisitorAbstract` instead of implementing the interface directly. The abstract class provides
empty default implementations.
Modifying the AST
-----------------
There are a number of ways in which the AST can be modified from inside a node visitor. The first
and simplest is to simply change AST properties inside the visitor:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\LNumber) {
// increment all integer literals
$node->value++;
}
}
```
The second is to replace a node entirely by returning a new node:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
// Convert all $a && $b expressions into !($a && $b)
return new Node\Expr\BooleanNot($node);
}
}
```
Doing this is supported both inside enterNode and leaveNode. However, you have to be mindful about
where you perform the replacement: If a node is replaced in enterNode, then the recursive traversal
will also consider the children of the new node. If you aren't careful, this can lead to infinite
recursion. For example, let's take the previous code sample and use enterNode instead:
```php
public function enterNode(Node $node) {
if ($node instanceof Node\Expr\BinaryOp\BooleanAnd) {
// Convert all $a && $b expressions into !($a && $b)
return new Node\Expr\BooleanNot($node);
}
}
```
Now `$a && $b` will be replaced by `!($a && $b)`. Then the traverser will go into the first (and
only) child of `!($a && $b)`, which is `$a && $b`. The transformation applies again and we end up
with `!!($a && $b)`. This will continue until PHP hits the memory limit.
Finally, two special replacement types are supported only by leaveNode. The first is removal of a
node:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Return_) {
// Remove all return statements
return NodeTraverser::REMOVE_NODE;
}
}
```
Node removal only works if the parent structure is an array. This means that usually it only makes
sense to remove nodes of type `Node\Stmt`, as they always occur inside statement lists (and a few
more node types like `Arg` or `Expr\ArrayItem`, which are also always part of lists).
On the other hand, removing a `Node\Expr` does not make sense: If you have `$a * $b`, there is no
meaningful way in which the `$a` part could be removed. If you want to remove an expression, you
generally want to remove it together with a surrounding expression statement:
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Expression
&& $node->expr instanceof Node\Expr\FuncCall
&& $node->expr->name instanceof Node\Name
&& $node->expr->name->toString() === 'var_dump'
) {
return NodeTraverser::REMOVE_NODE;
}
}
```
This example will remove all calls to `var_dump()` which occur as expression statements. This means
that `var_dump($a);` will be removed, but `if (var_dump($a))` will not be removed (and there is no
obvious way in which it can be removed).
Next to removing nodes, it is also possible to replace one node with multiple nodes. Again, this
only works inside leaveNode and only if the parent structure is an array.
```php
public function leaveNode(Node $node) {
if ($node instanceof Node\Stmt\Return_ && $node->expr !== null) {
// Convert "return foo();" into "$retval = foo(); return $retval;"
$var = new Node\Expr\Variable('retval');
return [
new Node\Stmt\Expression(new Node\Expr\Assign($var, $node->expr)),
new Node\Stmt\Return_($var),
];
}
}
```
Short-circuiting traversal
--------------------------
An AST can easily contain thousands of nodes, and traversing over all of them may be slow,
especially if you have more than one visitor. In some cases, it is possible to avoid a full
traversal.
If you are looking for all class declarations in a file (and assuming you're not interested in
anonymous classes), you know that once you've seen a class declaration, there is no point in also
checking all it's child nodes, because PHP does not allow nesting classes. In this case, you can
instruct the traverser to not recurse into the class node:
```
private $classes = [];
public function enterNode(Node $node) {
if ($node instanceof Node\Stmt\Class_) {
$this->classes[] = $node;
return NodeTraverser::DONT_TRAVERSE_CHILDREN;
}
}
```
Of course, this option is only available in enterNode, because it's already too late by the time
leaveNode is reached.
If you are only looking for one specific node, it is also possible to abort the traversal entirely
after finding it. For example, if you are looking for the node of a class with a certain name (and
discounting exotic cases like conditionally defining a class two times), you can stop traversal
once you found it:
```
private $class = null;
public function enterNode(Node $node) {
if ($node instanceof Node\Stmt\Class_ &&
$node->namespacedName->toString() === 'Foo\Bar\Baz'
) {
$this->class = $node;
return NodeTraverser::STOP_TRAVERSAL;
}
}
```
This works both in enterNode and leaveNode. Note that this particular case can also be more easily
handled using a NodeFinder, which will be introduced below.
Multiple visitors
-----------------
A single traverser can be used with multiple visitors:
```php
$traverser = new NodeTraverser;
$traverser->addVisitor($visitorA);
$traverser->addVisitor($visitorB);
$stmts = $traverser->traverse($stmts);
```
It is important to understand that if a traverser is run with multiple visitors, the visitors will
be interleaved. Given the following AST excerpt
```
Stmt_Return(
expr: Expr_Variable(
name: foobar
)
)
```
the following method calls will be performed:
```
$visitorA->enterNode(Stmt_Return)
$visitorB->enterNode(Stmt_Return)
$visitorA->enterNode(Expr_Variable)
$visitorB->enterNode(Expr_Variable)
$visitorA->leaveNode(Expr_Variable)
$visitorB->leaveNode(Expr_Variable)
$visitorA->leaveNode(Stmt_Return)
$visitorB->leaveNode(Stmt_Return)
```
That is, when visiting a node, enterNode and leaveNode will always be called for all visitors.
Running multiple visitors in parallel improves performance, as the AST only has to be traversed
once. However, it is not always possible to write visitors in a way that allows interleaved
execution. In this case, you can always fall back to performing multiple traversals:
```php
$traverserA = new NodeTraverser;
$traverserA->addVisitor($visitorA);
$traverserB = new NodeTraverser;
$traverserB->addVisitor($visitorB);
$stmts = $traverserA->traverser($stmts);
$stmts = $traverserB->traverser($stmts);
```
When using multiple visitors, it is important to understand how they interact with the various
special enterNode/leaveNode return values:
* If *any* visitor returns `DONT_TRAVERSE_CHILDREN`, the children will be skipped for *all*
visitors.
* If *any* visitor returns `DONT_TRAVERSE_CURRENT_AND_CHILDREN`, the children will be skipped for *all*
visitors, and all *subsequent* visitors will not visit the current node.
* If *any* visitor returns `STOP_TRAVERSAL`, traversal is stopped for *all* visitors.
* If a visitor returns a replacement node, subsequent visitors will be passed the replacement node,
not the original one.
* If a visitor returns `REMOVE_NODE`, subsequent visitors will not see this node.
* If a visitor returns an array of replacement nodes, subsequent visitors will see neither the node
that was replaced, nor the replacement nodes.
Simple node finding
-------------------
While the node visitor mechanism is very flexible, creating a node visitor can be overly cumbersome
for minor tasks. For this reason a `NodeFinder` is provided, which can find AST nodes that either
satisfy a certain callback, or which are instanced of a certain node type. A couple of examples are
shown in the following:
```php
use PhpParser\{Node, NodeFinder};
$nodeFinder = new NodeFinder;
// Find all class nodes.
$classes = $nodeFinder->findInstanceOf($stmts, Node\Stmt\Class_::class);
// Find all classes that extend another class
$extendingClasses = $nodeFinder->find($stmts, function(Node $node) {
return $node instanceof Node\Stmt\Class_
&& $node->extends !== null;
});
// Find first class occuring in the AST. Returns null if no class exists.
$class = $nodeFinder->findFirstInstanceOf($stmts, Node\Stmt\Class_::class);
// Find first class that has name $name
$class = $nodeFinder->findFirst($stmts, function(Node $node) use ($name) {
return $node instanceof Node\Stmt\Class_
&& $node->resolvedName->toString() === $name;
});
```
Internally, the `NodeFinder` also uses a node traverser. It only simplifies the interface for a
common use case.
Parent and sibling references
-----------------------------
The node visitor mechanism is somewhat rigid, in that it prescribes an order in which nodes should
be accessed: From parents to children. However, it can often be convenient to operate in the
reverse direction: When working on a node, you might want to check if the parent node satisfies a
certain property.
PHP-Parser does not add parent (or sibling) references to nodes by itself, but you can easily
emulate this with a visitor. See the [FAQ](FAQ.markdown) for more information.