Encoded Source Files

PyScripter

Encoded Python Source Files
Previous Top Next

PyScripter supports the PEP 263 fully.  The editor internally uses Unicode strings.  When saved, Python files can be encoded in either utf-8 or ansi encoding.

UTF-8 encoded source files

You can select this encoding from the File Formats submenu of the Edit menu.  From that menu you can select whether UTF-8 encoded source files include the BOM UTF-8 signature which is detected by the Python interpreter.  This signature is also detected by PyScripter when a file is loaded and other Windows editors. Although it is not necessary you are advised to include an encoding comment such as

# -*- coding: utf-8 -*-

as the first or second line of the python script.  The advantage of using UTF-8 encoded files is that they can run without modification in other computers with different default encoding. When using UTF-8 encoding you should specify all strings that are not plain ascii as python unicode stings by adding the prefix 'u'.

ANSI encoded files

If the UTF-8 flag of the File Formats submenu of the Edit menu is not selected, then the file is treated as an ANSI string.    To define a specific source code encoding, a magic comment must be placed into the source files either as first or second line in the file, e.g.:   

          #!/usr/bin/python
          # -*- coding: <encoding name> -*-

More precisely, the first or second line must match the regular expression "coding[:=]\s*([-\w.]+)". The first group of this expression is then interpreted as encoding name. If the encoding is unknown to Python, an error is raised during compilation. There must not be any Python statement on the line that contains the encoding declaration.  If such a comment is not present then the default system encoding is assumed.  PyScripter detects such comments when it loads Python Source files and decodes them to Unicode using the appropriate encoding.

The default python encoding is controlled by a Python file called "site.py"  which is located in the python lib directory (see function "setencoding" in site.py). The default encoding when python is installed is ascii, which does not support non-ascii characters (character value greater than 127).  If you are planning to use non-ascii strings in Python without using the utf-8 encoding, you will need to modify site.py and enable support for a locale aware default string encoding.

IDE encoding options for new files

Pyscripter provides two IDE options controlling the encoding of new files:

·     Default line breaks for new files
·     Default encoding for new files

IDE option for detecting UTF-8 encoding when opening files

Another IDE option (Detect UTF-8 when opening files) controls whether PyScripter attempts to detect utf-8 encoding when opening files without the BOM mark.  This detection is done by analyzing the first 4000 characters of the file and is imperfect.  It only applies to non-Python files since utf-8 encoded Python files are required to have either the BOM mark or an encoding comment.