Skip to content

Can't read 100mb from R2. Fuse, streaming from worker, signed urls, s3 api all fails after 10-15mb #137

@Victornovikov

Description

@Victornovikov

Checked multiple different ways of reading from R2

Method Initial Speed Failure Point Error Type
S3 SDK ~0.1 MB/s Never completes Exit code -1
Presigned URL 13.46 MB/s ~14MB Socket closed
Worker Stream 22-24 MB/s ~10MB ECONNRESET
FUSE (tigrisfs) N/A ~5-10MB Context deadline exceeded

Container network egress to R2 seems to be unreliable.
All four methods—including the officially documented FUSE approach—fail with the same pattern: initial success followed by connection death after ~10-15MB.

Not sure what I am doing wrong.

Workaround Attempts

We also tried:

  • Increasing timeouts (2min → 5min) - Still fails
  • Adding heartbeats - Connection still dies
  • Using streaming with progress logging - Confirmed data stops flowing...

More details:
.## Environment

  • Container Instance Type: standard-4
  • enableInternet: true
  • File Size Being Transferred: ~30MB -100MB (audio file for transcription)
  • Use Case: Audio transcription pipeline using FFmpeg + Whisper API

Failure Pattern (Consistent Across All Methods)

  1. Transfer starts successfully at good speed (13-24 MB/s)
  2. After ~10-15MB transferred, connection stalls
  3. After 2-3 minutes of stalling, connection dies
  4. Error indicates timeout or connection reset

Method 1: Container fetches from R2 using AWS S3 SDK

Approach: Container uses @aws-sdk/client-s3 to fetch directly from R2.

Container Code (clients.js)

const { S3Client } = require('@aws-sdk/client-s3');

const s3 = new S3Client({
endpoint: process.env.R2_ENDPOINT,
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
},
region: 'auto',
});
...### Container Code (transcribe.js)
const { GetObjectCommand } = require('@aws-sdk/client-s3');

const { Body } = await s3.send(new GetObjectCommand({
Bucket: BUCKET,
Key: r2_key,
}));

const chunks = [];
let downloadedBytes = 0;
for await (const chunk of Body) {
chunks.push(chunk);
downloadedBytes += chunk.length;
}
const audioBuffer = Buffer.concat(chunks);
...### Result

  • Speed: ~0.1 MB/s (extremely slow)
  • Error: Container exit code -1 (platform kill)
  • Log: Download stalls, container terminated by platform

Method 2: Container fetches via Presigned URL

Approach: Worker generates presigned URL, container uses fetch() to download.

Worker Code (audio.ts)

import { getSignedUrl } from '@aws-sdk/s3-request-presigner';
import { S3Client, GetObjectCommand } from '@aws-sdk/client-s3';

const s3Client = new S3Client({
endpoint: https://${env.CLOUDFLARE_ACCOUNT_ID}.r2.cloudflarestorage.com,
credentials: {
accessKeyId: env.R2_ACCESS_KEY_ID,
secretAccessKey: env.R2_SECRET_ACCESS_KEY,
},
region: 'auto',
});

const presignedUrl = await getSignedUrl(
s3Client,
new GetObjectCommand({
Bucket: 'llm-mindmap-audio',
Key: r2_key,
}),
{ expiresIn: 3600 }
);

// Pass to container
const response = await container.fetch('http://container/transcribe', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ r2_key, presigned_url: presignedUrl }),
});
...### Container Code (transcribe.js)
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), 300000); // 5 min

const response = await fetch(presigned_url, { signal: controller.signal });
const reader = response.body.getReader();
const chunks = [];
let downloadedBytes = 0;

while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
downloadedBytes += value.length;
}
...### Result

  • Initial Speed: 13.46 MB/s (good!)
  • Failure Point: ~14MB transferred
  • Error: SocketError: other side closed
  • Log:
    bytesRead: 14876987
    remoteAddress: '172.64.66.1'
    code: 'UND_ERR_SOCKET'
    [cause]: SocketError: other side closed
    ...---

Method 3: Worker streams R2 body to Container

Approach: Worker downloads from R2 using internal binding (fast), then streams the body to container.

Worker Code (audio.ts)

// Worker downloads from R2 using internal binding (fast, no egress)
const audioObject = await env.AUDIO.get(r2_key);
if (!audioObject) {
throw new Error(Audio file not found in R2: ${r2_key});
}

// Stream the R2 body directly to the container
const transcribeResponse = await container.fetch('http://container/transcribe', {
method: 'POST',
headers: {
'Content-Type': 'application/octet-stream',
'X-R2-Key': r2_key,
'Content-Length': audioObject.size.toString(),
},
body: audioObject.body, // ReadableStream
});
...### Container Code (transcribe.js)
if (req.headers['content-type'] === 'application/octet-stream') {
const expectedSize = parseInt(req.headers['content-length'] || '0', 10);
const chunks = [];
let receivedBytes = 0;

await new Promise((resolve, reject) => {
req.on('data', (chunk) => {
chunks.push(chunk);
receivedBytes += chunk.length;
});
req.on('end', resolve);
req.on('error', reject);
});

audioBuffer = Buffer.concat(chunks);
}
...### Result

  • Initial Speed: 22-24 MB/s (excellent!)
  • Failure Point: ~10MB transferred
  • Error: ECONNRESET: aborted
  • Log:
    [transcribe] worker_stream progress { bytes: 10485760, speed_mbps: '23.92' }
    ... 3 minutes later ...
    [transcribe] error at stage 'download_progress': Error: aborted
    code: 'ECONNRESET'
    ...---

Method 4: FUSE Mount with tigrisfs

Approach: Container mounts R2 as filesystem using tigrisfs, reads file directly.

tigrisfs reads AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from env
tigrisfs --endpoint "${R2_ENDPOINT}" -f "${R2_BUCKET_NAME}" /mnt/r2 &

Wait for mount

for i in $(seq 1 30); do
if mountpoint -q /mnt/r2 2>/dev/null; then
echo "R2 mounted successfully"
break
fi
sleep 1
done

exec node server.js
...### Container Code (transcribe.js)
const R2_MOUNT = '/mnt/r2';
const inputPath = path.join(R2_MOUNT, r2_key);

if (!fs.existsSync(inputPath)) {
return res.status(404).json({ error: File not found: ${r2_key} });
}

const stats = fs.statSync(inputPath); // This works!
console.log(Size: ${stats.size} bytes);

// FFmpeg reads from the FUSE mount
const probe = spawnSync('ffprobe', [..., inputPath]); // Times out
...### Result

  • Mount: Successful
  • File Metadata: Readable (size: 30,926,305 bytes)
  • Failure Point: Reading file content at ~5-10MB
  • Error: context deadline exceeded (Client.Timeout or context cancellation while reading body)
  • Log:
    [transcribe] Starting: .../file.mp3, size: 30926305 bytes
    Error reading 5242880 +5242880 of .../file.mp3 (attempt 1):
    context deadline exceeded (Client.Timeout or context cancellation while reading body)
    Error reading 10485760 +5242880 of .../file.mp3 (attempt 1):
    context deadline exceeded

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions