Python type checker / source code analyzer
Pyntch is a PYthoN Type CHecker. It detects possible runtime errors before actually running a code (take a look at a sample output). Pyntch examines a source code statically and infers all possible types of variables, attributes, function arguments, and return values of each function or method. Then it detects possible exceptions caused by type mismatch, attribute not found, or other types of exceptions raised from each function. Unlike other Python code checkers (such as Pychecker or Pyflakes), Pyntch does not check the style issues.
Pyntch can infer the following information from a source code:
obj.attr
where obj
does not have attribute attr
).
a[1]
where a is not a sequence).
func(1)
where func is not either function, method, or class).
sorted(x)
where x is not an iterable object).
Performance: Analyzed 67,000 lines of code in 28 minutes. (Core 2 Duo, 2.4GHz)
Download:
http://www.unixuser.org/~euske/python/pyntch/pyntch-dist-20090907.tar.gz
(56KBytes)
Discussion: (for questions and comments, post here)
http://groups.google.com/group/pyntch-users/
View the source:
http://code.google.com/p/pyntch/source/browse/trunk/pyntch
Have you experienced a TypeError caused by giving a wrong type of arguments, say, a string object to numeric functions? Or trying to access a nonexistent method of a wrongly passed class that would otherwise have such a method? One of the great advantages of scripting languages such as Python is its dynamicity. You can define any functions, variables and data structures whenever you want without elaborating the detailed type definitions. However, this feature comes with some cost: sometimes it is difficult to find potential errors that are caused by type mismatch before actually running the program.
In a language like Python, there is always a risk of uncaught runtime exceptions that a programmer could not foresee when s/he was writing the code, which causes sudden death of the program. This kind of behavior is particulary unfavorable for mission critical applications, so we want to catch these errors in advance. Unfortunately, as the program gets larger, it's getting hard to track these kinds of errors, and it's even harder to prevent them by infering which types/values can be passed or returned by each function.
Pyntch aims to help reducing these burdens by infering what kind of types can be assigned to variables/members/function arguments and what kind of types can be returned from a function at any time of execution, and what kind of exceptions might be raised. This is done by examining the code without executing it. The goal of Pyntch is to try to analyze every possible execution path and all possible combinations of data.
Sounds impossible? Well, I can show you at least this is partially possible, by using a technique called "typeflow analysis." For the details, see How it works? section. However, there's also a couple of drawbacks. Because the purpose of Pyntch is to catch as many obscure errors as possible before the code is acutally used in a production, it focuses on the coverage of the analysis at the expense of its accuracy. Sometimes Pyntch brings a lot of false positives in its result, which need to be further examined by human programmers.
setup.py
to install:# python setup.py install
The basic use of Pyntch is pretty simple and straightforward. Take this sample code:
$ cat -n sample1.py 1 def f(x,y): 2 return x+y 3 print f(3, 4) 4 print f(3, 'a')
To check this code, simply run the tchecker.py against the source file:
$ tchecker.py sample1.py === sample1 === [sample1] ### sample1.py(1) ......................................[A] # called at sample1.py(3) ..............................[B] # called at sample1.py(4) ..............................[B] def f(x=<int>, y=<int>|<str>): .........................[C] return <int> .........................................[D] raises <TypeError: not supported operand Add(<int>, <str>)> at sample1.py(2) ...[E]
The output shows several things:
f
starts from line 1 in sample1.py
.
x
is int
, and y
is either int
or str
.
int
.
TypeError
exception at line 2 by
attempting to add an integer and string.
Take a look at another example:
$ cat -n sample2.py 1 import sys 2 fp = file(sys.argv[1]) 3 while 1: 4 line = fp.readln() 5 if not line: break 6 print line $ tchecker.py sample2.py === sample2 === [sample2] fp = <file> ......................................[A] line = ? .........................................[B] sys = <Module sys (/usr/local/lib/python2.5/site-packages/pyntch/stub/sys.pyi)> ......[C] raises <AttributeError: attribute not found: <file>.readln> at sample2.py(4) .........[D] raises <AttributeError: attribute not found: @sample2.fp.readln> at sample2.py(4) ....[E]
fp
contains a file object.
line
is not determined (due to the exceptions).
sys
points to the sys
module.
readln
attribute. (It should be readline
.)
fp.readln
fails for any type of objects that fp
could have contained.
Pyntch can take module names instead of actual file names as input.
Pyntch searches the Python search path that is specified by PYTHONPATH
environment variable, as well as stub path (explained below). If you want to instruct
Pyntch to look at different locations, use -p
option to alter
the module search path:
$ tchecker.py -p /path/to/your/modules mypackage.mymodule
Due to the nature of source level analysis, Pyntch cannot analyze a program that uses external modules, in which the behavior of the code is specified only in opaque binaries. In that case, a user can instruct Pyntch to use an alternative "stub" module which is written in Python and defines only the return type of each function. Python stub modules are similar to C headers, but a Python stub is a real Python code that basically does nothing than returning a particular type of objects that the "real" function would return. For example, if a Python function returns an integer and a string (depending on its input), its stub function looks like this:
Although this looks meaningless, it is a valid Python code, and since Pyntch ignores its execution order (see Limitations section), Pyntch recognizes this function as one returning an integer and/or a string.def f(x): return 0 return ''
Python stub files end with ".pyi
" in their file names.
They are usually placed in the default Python search path.
When a Python stub and its real Python module both exist, the stub module is checked.
Stub modules for several built-in modules such as sys
or
os.path
are included in the current Pyntch distribution.
They are normally placed in the Pyntch package directory
(e.g. /usr/local/lib/python2.5/site-packages
) and
used by default instead of built-in Python modules.
Pyntch cannot correctly analyze a built-in function that returns
different types of values depending on its parameters
(notably struct.unpack
).
Since Pyntch can produce a lot of information about a code, a user might be overwhelmed by the amount or complexity of its result. Pyntch offers a couple of ways for controlling the outputs in order to provide the desired information.
-C key=value
-c config_file
ErrorConfig
settings.
-p python_path
-p
options are allowed.
-P stub_path
-P
options are allowed.
-a
show_all=True
.
These are defined in ErrorConfig
class (config.py).
raise_uncertain
(boolean)
ignore_none
(boolean)
show_all
(boolean)
One of the major drawbacks of typeflow analysis is its inability to take account of execution order (which is also true for dataflow analysis). The sequence of statements is simply ignored and all the possible order is considered. This is like considering every permutation of statements in a program and combining them into one. This sometimes brings inaccuracy to its result in exchange for a comprehensiveness of the checking. For example, consider the following two consecutive statements:
x = 1 x = 'a'
After executing these two statements, it is clear that variable x has always a string object, not an integer object. However, due to the lack of awareness of execution order, Pyntch reports this variable might have two possible types: an integer and string. Although we expect this kind of errors does not affect much to the overall usefulness of the report, we provide a way to supress this type of output. Also, Pyntch cannot detect UnboundLocalError.
Another limitation is that Pyntch assumes the scope of each namespace is statically defined, i.e. all the names (variables, functions, classes and attributes) are written down in the source code. Therefore a program that define or alter the namespace dynamically during execution cannot be correctly analyzed. Basically, a code has to meet the following conditions:
globals()
or locals()
function,
nor refering to or altering __dict__
member.
getattr
or setattr
.
eval
, compile
or exec
functions.
(This section is still way under construction.)
The basic mechanism of Pyntch is based on the idea of "typeflow analysis." This is similar to dataflow analysis, which gives the maximal set of possible data that are stored at each location (either variable or continuation) in a program. First, it constructs a big connected graph that represents the entire Python program. Every expression or statement is converted to a "node", which is an abstraact place where certain type(s) of data is stored or passed. Then it tries to figure out what type of data goes from one node to another.
Let us consider a trivial example:
A = 'xyz' B = 2 C = a*b
Given the above statements, Pyntch constructs a graph shown in Fig. 1. A square shaped node is a "leaf" node, which represents a single type of Python object. A round shaped node is a "compound" node, which is a place that one or more types of objects can be potentially stored. Now, the data stored at the top two leaf nodes, which are a string and an integer object respectively, flow down to the lower nodes and each node passes the data according to the arrow. Both objects are "mixed" at the plus sign node, which produces a string object (because in Python multiplying a string and an integer gives a repreated string). Eventually, the object goes into variable c, which is the node at the bottom. This way, you can infer the possible type of each variable.
Now take an example that involves a function:
def foo(x): return x+1 def bar(y): return y*2 f = foo z = f(1) f = bar z = f(2)
Copyright (c) 2008-2009 Yusuke Shinyama <yusuke at cs dot nyu dot edu>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.