Friday, April 16, 2010

Celebrating the first year of IPython logging

It was last year today when I first started logging my IPython sessions explicitly using Pierre Raybaut's idea. All you need to do is just to make the changes/additions described in this piece of documentation (Logging to a file) on its soon-to-be-change LP repository. (Applies to IPython 0.10 and below)

You get a time-stamped log file (in your ~/ipython/ or wherever your IPython home directory is set at) created per day that looks like below:

#!/usr/bin/env python
# 2009-04-16.py
# IPython automatic logging file
# 13:15
# =================================
d = loadtxt(file, skiprows=30)
plot([d[i][8:] for i in range(12)])
# =================================
# 14:08
# =================================
boxplot(d[:][8:])

As of writing this entry I count almost 300 separate logs and combining them into one file using this little script yields about 37.5 k-lines (including lots of multiple entries, time-stamps, empty comments, many copy-paste codes that I haven't actually typed in)

Besides having this combined file as a rough measure for myself there is another good use for it as triggered with this question: How to exit IPython properly? IPython internal history file forgets what was in the session if you accidentally or intentionally kill your IPython session without issuing an Exit at the exit :) That new combined history file comes to our help.

First we will append all the time-stamp logs into one file (rename it to "history" so that IPython can load it at the start-up) Then from iplib.py comment the readline.set_history_length(1000) line to prevent 1000 lines limit in your history file. Now I can access all my previous coding history from within IPython again no matter how I end my sessions. (Providing that I will stitch my logs periodically)

Lazy coding at its best!

It would be great if IPython could handle history lines more smartly to read multiple lines back properly. Who knows maybe an IPython super user has a solution for that laziness as well.

By the way anyone knows how to remove duplicate lines from a file without actually sorting it?

7 comments:

  1. Duplicate line numbers, quick & dirty, considering 'older' lines as duplicates:

    D = dict()
    dupes = list()

    for n, line in enumerate(open('pattern.txt', 'r').readlines()):
    if D.get(line): dupes.append(D[line])
    D[line] = n

    ReplyDelete
  2. a = open("history", "r").readlines()
    b = set(a)

    does the same as yours. However neither with your solution nor with the set I couldn't write the result preserving the order I see on screen using IPython.

    h = open("new", "w")
    h.writelines(D) # or h.writelines(b)

    They show the same, write the same but in a different order than what is listed. It would be nice to preserve the order in the original file.

    ReplyDelete
  3. It's actually easy: iterate over the file, keep a set of the lines you've seen so far and write only those you haven't:

    seen = set()
    with open('input.txt') as input:
    with open('output.txt', 'w') as out:
    for line in input:
    if line not in seen:
    seen.add(line)
    out.write(line)

    ReplyDelete
  4. Thanks RL. This is the solution that I have been looking for. My history file has shrunk down to ~14K lines from ~38, without losing its original order and replicates removed.

    ReplyDelete
  5. Output unique lines sans reorder:

    $ perl -ni.orig -e 'print unless $s{$_}++' filename

    I like Python, but Perl is handy to have around...

    Thanks for the info!

    ReplyDelete
  6. ...forgot to say, but to procdess all individual files at once to a single output files, try:

    $ cat *.log | perl -ne 'print unless $s{$_}++' > filename

    ReplyDelete
  7. GÖKHAN, you still updating your blog?

    - Ed from over the pond

    ReplyDelete