Python and rata.digitraffic API #1

Having had some time to sit down and to learn more Python I was trying to find a mini project which would be small enough for me to really do it (not just copy stuff from the internet) but also be useful. At least to me. So I decided to learn how to parse JSONs and to present the data with Python.


My personal life has experienced a rather big change during the summer as we moved from Helsinki to my home town. I still work in Helsinki region and that equals longer commuting time. There are great web services and apps to check whether the trains are delayed or cancelled ( to name one ). But why settle for existing one when you can do it yourself ūüôā¬† So I navigated to the Digitraffic site in order to learn how the data looks like and behaves.

I don’t have the first version saved anymore but basically I wrote one rather huge py script with all the logic in very very nested loops. But it worked and gave me an opportunity to understand the data more and also how to actually (not?) code with Python .Gradually I started to move the logic away from the main script into a class in order to organise it better. It’s still a very much work in progress but I do get the trains and the arrival times nicely sorted on my screen. The next phase will be to move into GUI (tkinter perhaps?) and perhaps have it running on a Raspi Pi with screen bolted on the wall.

Getting the JSON and parsing it

As I have never really coded professionally I haven’t really put too much thought on coding conventions (PEP 8 in this case). I acknowledge the fact that the code here is not compliant with the PEP 8 but I will slowly get there and I welcome all the suggestions on cleaning up the code.

The data I’m after is all the traffic (restricted to 15 trains) per specific station at the time of the request. So it returns 15 arriving and departing trains (the same train can be both) as JSON file.

I had never used urllib2 (or the predecessor)¬† before so it took some googling and stackoverflow searches before I was able to fetch the data. I’m using the station code as an argument when running the script and that is fed into the urllib2.Request:¬† + station + . The request is opened¬† and finally simplejson loads the data into object called data (I still have a lot learn regarding coding conventions as noted earlier). That is then dumped into a ‘station.json’ file. I’m sure the code could be enhanced and it will be but this is the situation now.

def get_json(station):
        req = urllib2.Request(""+ station +"&arrived_trains=0&arriving_trains=15&departed_trains=0&departing_trains=15")
        opener = urllib2.build_opener()
        fi =
    except Exception, e:
        print e
    data =  json.loads(

    with open('./juna2/station.json', 'w') as f:
        json.dump(data, f)

The current version prints the data to the terminal and therefore some formatting is needed. Hence all those tabs in the print functions. I originally thought of saving multiple files into the directory and hence the first for loop. Then again, it’s easier not to carry the actual file name (should I decide to change it) so I’ll stick with this for now.

So the files found are fed to the ‘JunaParser’ (Juna is finnish for train) where first a list of all the trains in the file are queried and then the arrival time is queried is that very same train number. Finally a dictionary is created with arrival time as key and the train as value. The reason for that is that I know how to sort dictionary by the keys plus it seems to be a lot easier, makes sense though.

        if len(sys.argv) < 3:
            print ("Please give the folder to be processed AND trainstation! Exiting...")            
            json_path = str(sys.argv[1])
            station = str(sys.argv[2])
    except Exception, e:
        print (e)

    print ("Train number \t From \t Arrival time \t Est arrival time \t To \t Depature time \t Est departure time")
    dict = {}
    for fn in os.listdir(json_path):
        parser = JunaParser(json_path + fn)
        for item in parser.get_trains(): 
            dict[parser.get_arrival_time(item, station)] = item
        od = collections.OrderedDict(sorted(dict.items()))  
        for key,value in od.items(): 
            print (str(value) + " \t\t " + str(parser.get_start_station(value)) + " \t " + key  +  " \t " + str(parser.get_exp_arrival_time(value, station)) + " \t\t " + str(parser.get_end_station(value)) + " \t " + str(parser.get_departure_time(value, station))) +  " \t " + str(parser.get_exp_departure_time(value, station))

Finally the rest of the data is queried based on the sorted dictionary of the trains and the end result is nice list of trains ordered by the arrival time.

junaparseI will go through the actual JunaParser class in the next post but you can see what information is extracted from the data.





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s