Want to ensure financial documents cant be parsed by automated systems

  • GBU_28@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    11 days ago

    What? If the document is accessible, and human readable, it’s parsable by OCR

    • cannedtuna@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      10 days ago

      I don’t know what to tell you dude. A certified or digitally can’t/wont be read by OCR. A digitally signed document legally certifies that the document has not been modified. PDF editors such as Bluebeam or Adobe will not or cannot process a certified or digitally signed document.

      I’m not sure if that limitation is due to the process by which the document is certified or if it is a feature of software conforming for legality reasons. I’m not going to research this for OP, I’m just providing a simple and best accurate answer.

      Maybe current AI has better abilities to process document text? I’m not sure, maybe. But you’d think this would be a shared concern with groups wanting to protect documents for the same reason and therefore encryption would match.

      If it’s just the legality of it stopping a company from providing the feature, you would think most companies would want to keep out of legal hot water and would then disallow OCR processing. In this case sure there could be software that doesn’t conform, but for most application purposes I don’t think you’d have to worry too much.

      • BCsven@lemmy.ca
        link
        fedilink
        arrow-up
        0
        ·
        10 days ago

        Lots of software can manipulate PDF. Open PDF in libredraw change pages,print as PDF or export as PDF. A system that skims contentiss purposely going to bypass and signed restriction

      • GBU_28@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        10 days ago

        Many alternative OCR tools now simply screenshot the page. This is a cracked issue.