Skip to content

Conversation

@naugtur
Copy link
Member

@naugtur naugtur commented Dec 17, 2025

Description

With compartment-mapper introducing support for dynamic imports, the old regexp-based ways in which we evaded import censorship in SES are no longer viable and AST based transform can be the only meaning-preserving one.

Open questions

  • what about import ( - with whitespace? We could add complexity to the transform or assume they're rare enough and let them hit censorship.
    • will implement as split by regexp unless somebody stops me
  • Should it be opt-in or opt-out?
    • opt-out by unanimous opinion from people present in Endo meeting
    • implement
  • Does a step in the process between compartment-mapper and SES exist where the supported dynamic import call has been changed to its implementation and the only occurences of import( that are left need not be preserved? Could we use a simple regexp-replace there for a cheaper way to get the same result?
    • not a welcome option to create a potential for invalid programs in ModuleSource constructor
  • Do we even support eval('var a = import("somemodule")') or intend to ?

Security Considerations

  • Could the new transform be targeted to unlock malicious meaning in an otherwise harmless code? (it's meaning-preserving and AST based, so it seems unlikely)

Scaling Considerations

AST transform is slower than regexp, but for cases where import( doesn't exist in a string, it could be faster than a regexp search o the entire string.

Documentation Considerations

  • needs documenting after we decide opt-in or out
  • naming things is hard

Testing Considerations

  • added a testcase.
  • once we figure out opt-in/out a more end2end test in compartment-mapper would be nice.
  • more template string tests could help, the transform for them is more complex than I'd like, with room for off-by-one errors.

Compatibility Considerations

with the transform being meaning-preserving, this should not be breaking for anyone. Also, considering opt-in

if (options && options.preventHtmlCommentRegression) {
// Workaround for generator creating `-->` from `-- >` in code generated from the AST
result.code = result.code.replace(/-->/g, '-- >');
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kriskowal @mhofman is this controversial at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case covering that this won't affect the contents of strings in the output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean here. if a string contains an html coment sequence, it'll get disconnected. So there should be nothing left in strings to affect. Is that what you mean? Or what kind of fixture would the test need?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code like const htmlCommentClose = "-->"; (in which the sequence appears as string contents) must not be affected.

Copy link
Member Author

@naugtur naugtur Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OH, it would be affected.
It's not meaning-preserving, that's why I made it opt-in.
We've been doing that same transform in LavaMoat for a long time and it didn't bite us (this particular one at least) so I thought we could put it in at the much better time.
Maybe a scarier name is gonna make this feature more palatable.

Anyway, I'll extract it into a separate PR to proceed with the part we most care about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've replaced this with a potentially cheaper trick that introduces an empty comment inbetween a--/**/>b

Now the only controversy is the new regexp evasion. @gibson042 check it out :)

@@ -0,0 +1,171 @@
const evadeRegexp = /import\s*\(|<!--|-->/g;
Copy link
Member Author

@naugtur naugtur Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this now provides multiple evasions in one go

@naugtur naugtur marked this pull request as ready for review December 18, 2025 14:08
Comment on lines +185 to +193
const multilineimport =\`␊
console.log(${a})␊
await im${""}port('some-module');␊
console.log(${b})␊
\`;␊
const taggedtemplate = String.dedent\`␊
await import('some-module');␊
\`;␊
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there this difference between tagged vs. untagged template literal contents?

And as an aside, I think this test would be better if the input and expected output appeared in the same file (i.e., not using t.snapshot). It's valuable for high volume, but this use seems to come down on the wrong side of the tradeoff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the tag function would see an extra split and empty string value in it. Where without a tag the effects of having an extra hole filled with an empty string are unobservable.

Copy link
Member

@gibson042 gibson042 Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's more code to get right, but if we're making this change, we should be thorough about it, e.g.:

        const taggedtemplate = makeEvadedTemplateApplier(String.dedent)`
            await ${IMPORT}('some-module');
        `;

where IMPORT is a unique global sentinel such as Symbol('import') and makeEvadedTemplateApplier is a global function like

const makeEvadedTemplateApplier =
  fn =>
  (T0, ...A0) => {
    const argCount = A0.length;
    let mapped = weakmapGet(evadedTemplateMappings, T0);
    if (!mapped) {
      const raw0 = T0.raw;
      const T1 = [];
      const raw1 = [];
      const importIndexes = [];
      for (let i = 0, j = 0; i <= argCount; ++i, ++j) {
        T1[j] = T0[i];
        raw1[j] = raw0[i];
        while (i < argCount && A0[i] === IMPORT) {
          importIndexes[importIndexes.length] = i++;
          T1[j] += `import${T0[i]}`;
          raw1[j] += `import${raw0[i]}`;
        }
      }
      defineProperty(T1, 'raw', {
        ...{ writable: false, enumerable: false, configurable: false },
        value: freeze(raw1),
      );
      mapped = freeze({ T1: freeze(T1), importIndexes });
      weakmapSet(evadedTemplateMappings, T0, mapped);
    }
    const { T1, importIndexes } = mapped;
    let i = 0;
    let j = 0;
    let nextImportIndex = j < importIndexes.length && importIndexes[j];
    const next = () => {
      while (i === nextImportIndex) {
        i += 1;
        j += 1;
        nextImportIndex = j < importIndexes.length && importIndexes[j];
      }
      if (i < argCount) return { done: false, value: A0[i++] };
      return { done: true, value: undefined };
    };
    const A1 = { [iteratorSymbol]: () => A1, next };
    return fn(T1, ...A1);
  };

(which AFAIK is unobservable, until/unless https://github.com/tc39/proposal-array-is-template-object advances to Stage 4)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to put all that runtime in a single line of code, and it’s a lot. Might have to do some hard trade-off calculus. I’m in favor of making monotonic progress until we hit a cost cliff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know to what extent minification is on the table for injected helpers like the above, but if we are comfortable with it then the cost would be just under 500 bytes, and only necessary when the source text being transformed has one or more tagged template literals with static contents that would be affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's about the potential issues with sourcemaps or having to find a place to put it. Also, the level of complexity I'm not comfortable with introducing.
It's a complex fix for a very unlikely problem. And if a module contains a tagged template string with an import statement in it, it's likely problematic in many other ways.

HTML in tagged template strings is somewhat common, but as a virtual dom component, where HTML comments don't make sense.
I fail to find an example of tagged template strings containing hardcoded javascript with imports in it.

I'd like to avoid shipping a tagged template string transform for now.

@naugtur
Copy link
Member Author

naugtur commented Jan 12, 2026

I'm considering also ading a transform that changes a RegExp /import (.*)/ into /im[p]ort (.*)/ as we've just encountered one in some fairly popular software.

Copy link
Member

@boneskull boneskull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems reasonable but I think @gibson042's comment about --> needs addressing

@naugtur naugtur force-pushed the naugtur/more-evasions branch from 8609251 to 6939389 Compare January 21, 2026 12:50
@naugtur naugtur requested review from gibson042 and mhofman January 21, 2026 12:56
* @param {import('@babel/traverse').NodePath} p
* @returns {void}
*/
export const evadeRegexpLiteral = p => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gibson042 @mhofman this is a new addition to this PR

it's meaning preserving unless the regexp is listing characters to match, including an open brace like this: /[import(]+/ - which as regular expressions go seems unlikely enough. Do we want to avoid that with one extra test to see if a ( exists between matching [] ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants