Performance improvement

I am using your library with Pandas. Performance is not that good (it takes 1-2 seconds to process a full year).
The reasons for this are: 

- operations are performed sequentially while it could be partially vectorised.
- everyhting is decoded even though you don't need everything


The way I see things:

- use pandas.read_fwf for the mandatory sections
- use apply method for the remaining part of the string (additional fields + remarks).

Usually, you know what information you are trying to get (and probably not every field that is present).
The idea would be to provide a list of desired fields. Based on that list, we could perform only the necessary decoding and return a Pandas Dataframe (or a list of records)

That would increase speed a lot.

Are you interested in such evolution for your library ?

Thanks,
Vincent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Performance improvement #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions