Trent Mick

who knew path joining differed so between Python, Ruby, Node, Perl

In Python I do a lot of path manipulations for build systems, various command-line utilities and Komodo support modules. Typically this is with Python’s os.path module. One thing I’ve come to expect of path joining, os.path.join, is this (apparently rare) detail:

If any component is an absolute path, all previous path components
will be discarded.

I say “apparently rare” because, in Python:

$ python
>>> import os.path
>>> os.path.join("/Users/trentm", "/var/log")
'/var/log'

in Ruby:

$ irb
>> File.join("/Users/trentm", "/var/log")
=> "/Users/trentm/var/log"

in Node.js:

$ node
node> var path = require('path')
node> path.join("/Users/trentm", "/var/log")
'/Users/trentm/var/log'

in Perl:

$ cat pathjoin.pl 
use File::Spec;
print File::Spec->join('/Users/trentm', '/var/log'), "\n";
$ perl pathjoin.pl 
/Users/trentm//var/log

Conclusions? Certainly none of these libraries is going to change their behaviour here, with the possible exception of Node which is young and changing very quickly. I’d say the double ‘/’ in Perl’s File::Spec is poor – though it doesn’t give in invalid path. You could certainly argue that Ruby’s and Node’s interpretation is less subtle (often a good thing). I like Python’s interpretation: os.path.join is kind of like running cd for each given path in sequence to get the resultant directory. It means I don’t need a guard against an absolute path input datum being joined to a current working directory scope.

I’d be curious to know what is typical in other languages, if there are any takers reading this post. If blog commenting isn’t your thing, you can tweet “@trentmick”.

14 comments

Komodo 6.0 Beta 2: HTML 5, CSS 3, Python 3, DB Explorer, ...

We (ActiveState) released Komodo 6.0 beta 2 yesterday and we want your feedback. HTML 5 autocomplete. CSS 3 autocomplete. Full Python 3 support (debugging, syntax checking, autocomplete, code browsing). A new Database Explorer tool for quickly exploring SQL databases (SQLite out of the box and extensions for MySQL and Oracle, with plans for Postgres). A new project system called “Places” that adds a file system browser (local and remote). New publishing support for syncing a directory with a remote machine. Additions to Komodo’s Hyperlinks for quickly navigating to file references. Added support for PHP, Perl, Ruby and JavaScript regular expression debugging with Komodo’s excellent Rx tool. See the Komodo 6.0 Features post for a full outline.

Komodo IDEhttp://www.activestate.com/komodo-ide/downloads
Komodo Edithttp://www.activestate.com/komodo-edit/downloads

Full post here on the ActiveState blog.

1 comments

eol.py 0.7.4 -- Python 3 support

Where?

What's new?

  • Python 3 support (not heavily tested yet)
  • Starter test suite

Full changelog: http://github.com/trentm/eol/tree/master/CHANGES.md#files

What is 'eol'?

eol is both a command-line script eol and a Python module eol for working with end-of-line chars in text files.

Command line usage

# list EOL-style of files
$ eol *
configure: Unix (LF)
build.bat: Windows (CRLF)
snafu.txt: Mixed, predominantly Unix (LF)

# find files with a given EOL-style
$ eol -f CRLF -x .svn -r ~/src/python
/Users/trentm/src/python/Doc/make.bat
/Users/trentm/src/python/Lib/email/test/data/msg_26.txt
/Users/trentm/src/python/Lib/encodings/cp720.py
...

# convert EOL-style of files
$ eol -c LF foo.c 
converted `foo.c' to LF EOLs

Module usage

>>> import eol
>>> eol.eol_info_from_path("configure")
('\n', '\n')         # (<detected-eols>, <suggested-eols>)
>>> eol.eol_info_from_path("build.bat")
('\r\n', '\r\n')
>>> eol.eol_info_from_path("snafu.txt")
(<class 'eol.MIXED'>, '\n')

See the README for full usage information.

0 comments

quick hack how to move a part of a Mercurial (hg) repo to git

My quick dirty hack to move a (small) part of a Mercurial (hg) repo to Git. In my case this was for moving my single file “testlib.py” from here on bitbucket to here on github

  1. Dump the log of that part of the hg repo to a file.

    export WORKDIR=$HOME/tmp/migrate
    cd HGREPO/foo
    hg log -pv . > $WORKDIR/full.patch
    
  2. Create the starter git repo

    cd $WORKDIR
    git init foo
    
  3. Break up the log into a number of “changesetNNN.patch” files with this Python code:

    # parse.py
    import codecs
    changeset = []
    i = 0
    
    def write_changeset():
        global changeset
        if not changeset:
            return
        codecs.open("changeset%05d.patch" % i, 'w', "utf-8").write(''.join(changeset))
        i += 1
        changeset = []
    
    for line in open("full.patch"):
        if line.startswith("changeset:"):
            write_changeset()
        changeset.append(line)
    write_changeset()
    

    and then run:

    python parse.py
    
  4. Apply and commit each patch with this Python script

    # Usage: python apply.py TARGET-REPO-BASE-DIR
    import os
    from os.path import *
    from glob import glob
    import subprocess
    from pprint import pprint
    
    def apply_patch(target_repo_base_dir, patch_path):
        content = open(patch_path).read()
        header, diff = content.split('\n\n\n', 1)
        assert diff.startswith("diff --git a")
        fields = {}
        lines = header.splitlines(False)
        for i, line in enumerate(lines):
            key, value = line.split(':', 1)
            if key == "description":
                fields[key] = '\n'.join(lines[i+1:])
                break
            value = value.strip()
            fields[key] = value
        
        # Do any path renamings here. For example, I wanted to move from
        # "testlib/testlib.py" in the old repo to "lib/testlib.py" in the new.
        diff = diff.replace('a/testlib/testlib.py', 'a/lib/testlib.py')
        diff = diff.replace('b/testlib/testlib.py', 'b/lib/testlib.py')
        
        f = open(patch_path+".diff", 'w')
        f.write(diff)
        f.close()
        f = open(patch_path+".msg", 'w')
        f.write(fields["description"])
        f.close()
        cwd = target_repo_base_dir
        subprocess.check_call(['git', 'apply', '--whitespace=nowarn',
            abspath(patch_path+".diff")], cwd=cwd)
        subprocess.check_call(['git', 'add', 'lib/testlib.py'], cwd=cwd)
        subprocess.check_call(['git', 'commit', '--date', fields["date"],
            '-F', abspath(patch_path+".msg")], cwd=cwd)
    
    if __name__ == "__main__":
        target_repo_base_dir = sys.argv[1]
        patches = list(sorted(glob("changeset*.patch")))
        for patch_path in patches:
            print "--", patch_path
            apply_patch(target_repo_base_dir, patch_path)
    

    Then run:

    python apply.py foo    # apply all changeset*.patch files to "foo" git repo
    

Now you can push this Git repo to github or whereever.

To improve on

  • You currently need to manually create the dir structure first.
  • This doesn’t currently used the parse “user” field from the hg commit log for the “git commit -a AUTHOR” command. Mainly this is because I didn’t need that, but also because the “user” value in the hg log isn’t the configured full user name and email, but just the short username. Maybe that was just me.
4 comments

On a bit of confusion

On a bit of confusion in ActivePython 2.7 released - and what 2.7 means for Python’s future, in particular the following paragraph. Before:

While the Python community has declared a moratorium on major 2.x releases in an effort to facilitate other Python implementations to catch up and, thus, accelerate the adoption of Python 3.x, ActiveState will continue supporting 2.7.x and adding new modules and updating revisions to existing ones as they become available.

After:

While you may have read, the Python community has declared a temporary moratorium (suspension) on the Python language syntax in an effort to facilitate other Python implementations to catch up to Python 3.x — the moratorium does not that mean that python core development has stopped or even slowed down.

On the contrary, new modules continue to be added, bugs fixed, and performance tweeked — and, as always, ActiveState will continue supporting 2.7.x with builds, extra modules and PyPM as they become available.

Jesse Noller accurately noted that the former paragraph was confusing. In particular, a possible interpretation that the Python community isn’t going to be supporting Python 2.7 – which is just not true. Python 2.7 will be supported for longer than the typical two years that a Python 2.x release is supported.

It is easy to convolve the mostly unrelated Python language moratorium and the plan that Python 2.7 is the last 2.x. The two are somewhat related in that ultimately the hope is they both lead to smoother adoption of Python 3. The issue (from ActiveState’s Product Manager’s point of view) is that an enterprise customer can get swayed away from considering Python when reading stuff like the following from Python moratorium and the future of 2.x:

On November 9, Python BDFL ("Benevolent Dictator For Life") Guido van Rossum froze the Python language’s syntax and grammar in their current form for at least the upcoming Python 2.7 and 3.2 releases, and possibly for longer still. This move is intended to slow things down, giving the larger Python community a chance to catch up with the latest Python 3.x releases.

It is very easy for the less-technical person to interpret “moratorium on Python language syntax” as a stop on all core Python development. This differentiation was the kernel of a heated debate I just had with our Product Manager recently.

The intention of the paragraph in the ActiveState blog post is basically to state that the language moratorium isn’t something that should dissuade businesses from considering Python.

That said: Mea culpa. I had the chance to catch this the first time and didn’t.

1 comments

django-markdown-deux Django app

  • Project page: http://github.com/trentm/django-markdown-deux
  • on PyPI: http://pypi.python.org/pypi/django-markdown-deux/

django-markdown-deux is a small Django app that provides template tags for using Markdown using the python-markdown2 library. MIT license.

What's with the "deux" in the name?

The obvious name for this project in django-markdown2. However, there already is one! and name confusion doesn't help anybody. Plus, I took French immersion in school for 12 years: might as well put it to use.

Quick Usage

markdown template filter

{% load markdown_deux_tags %}
...
{{ myvar|markdown:"STYLE" }}      {# convert `myvar` to HTML using the "STYLE" style #}
{{ myvar|markdown }}              {# same as `{{ myvar|markdown:"default"}}` #}

markdown template block tag

{% load markdown_deux_tags %}
...
{% markdown STYLE %}        {# can omit "STYLE" to use the "default" style #}
This is some **cool**
[Markdown](http://daringfireball.net/projects/markdown/)
text here.
{% endmarkdown %]

See more usage info, available settings, installation notes, etc. at the github project page. (I mention on Moz planet because Benjamin is, or was, using python-markdown2 and I've heard Mozilla is using more Django these days.)

0 comments

Donating blood in downtown Vancouver

I like to try to donate blood regularly (you're allow to donate every 56 days). It is an easy thing to do to help the system. But really I'm just in it for the free cookies at the end of it. There is no other way I can justify my having an Oreo at 10am. Here is how in downtown Vancouver (where I work):

Some tips: You'll need ID. A driver's licence will do. Don't go at lunch time when it gets much more busy.

0 comments