-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
I am using your library with Pandas. Performance is not that good (it takes 1-2 seconds to process a full year).
The reasons for this are:
- operations are performed sequentially while it could be partially vectorised.
- everyhting is decoded even though you don't need everything
The way I see things:
- use pandas.read_fwf for the mandatory sections
- use apply method for the remaining part of the string (additional fields + remarks).
Usually, you know what information you are trying to get (and probably not every field that is present).
The idea would be to provide a list of desired fields. Based on that list, we could perform only the necessary decoding and return a Pandas Dataframe (or a list of records)
That would increase speed a lot.
Are you interested in such evolution for your library ?
Thanks,
Vincent
Metadata
Metadata
Assignees
Labels
No labels