Skip to content

Performance improvement #20

@vtoupet

Description

@vtoupet

I am using your library with Pandas. Performance is not that good (it takes 1-2 seconds to process a full year).
The reasons for this are:

  • operations are performed sequentially while it could be partially vectorised.
  • everyhting is decoded even though you don't need everything

The way I see things:

  • use pandas.read_fwf for the mandatory sections
  • use apply method for the remaining part of the string (additional fields + remarks).

Usually, you know what information you are trying to get (and probably not every field that is present).
The idea would be to provide a list of desired fields. Based on that list, we could perform only the necessary decoding and return a Pandas Dataframe (or a list of records)

That would increase speed a lot.

Are you interested in such evolution for your library ?

Thanks,
Vincent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions