pilot_plan

Phase 3-4 Complete: Final Idea Selection

Surviving Ideas After Novelty Check

✅ Tier 1: High Novelty - PILOT APPROVED

Idea 2: Cross-Modal Budget Allocation (CMBA)

Novelty Score: 8/10
Risk: LOW
Compute: 24h full / 12h pilot
Status: ✅ APPROVED FOR PILOT
Unique angle: First systematic ablation of allocation ratios (vision:projector:language)
Expected outcome: Identify optimal ratio (e.g., 2:4:1), show projector needs 2-4x higher rank

⚠️ Tier 2: Medium Novelty - PAPER VALIDATION ONLY

Idea 5: Modality-Specific Learning Rate Scaling (MSLR)

Novelty Score: 7/10 (estimated)
Risk: LOW
Compute: 24h full
Status: ⚠️ VALIDATE ON PAPER, NO PILOT
Findings from search:
- Adaptive Learning Rate Mitigates LoRA Fatal Flaws - Studies LR for LoRA but single-modality
- Learning Rate Scaling across LoRA Ranks - LR scaling for different ranks, not modalities
- Interleaved Instruction Tuning with Modality-Specialized Adaptations - Modality-specialized adapters, but doesn't combine with adaptive rank
Gap: No work combines modality-specific LR + adaptive rank allocation
Why no pilot: Lower priority than CMBA, can validate theoretically first

Idea 4: Hierarchical Rank Allocation (HRA)

Novelty Score: 6/10
Risk: MEDIUM (overlap with HyDRA)
Compute: 32h full
Status: ⚠️ NEEDS REFINEMENT, NO PILOT
Overlap: HyDRA already does hierarchical + dynamic for mobile VLMs
Differentiation needed: Must clearly distinguish "modality-level → layer-level" from HyDRA's "layer → head" hierarchy
Why no pilot: Too similar to existing work without clearer framing

❌ Tier 3: Eliminated

Idea 1: MA-LoRA - Too similar to MokA/MARS
Idea 3: ZRP - SR-LoRA already covers this
Idea 6, 7, 8 - Exceed compute budget
Idea 9 - High risk + high compute
Idea 10: Dynamic Rank Adjustment - DyLoRA, HyDRA, L1RA already do this

Phase 5: Pilot Experiment Plan

Decision: Run 1 Pilot (CMBA Only)

Rationale:

CMBA has strongest novelty signal (8/10) and clear gap
Other ideas either have novelty concerns (HRA) or can be validated on paper (MSLR)
Better to run 1 high-quality pilot than 3 mediocre ones

CMBA Pilot Design

Goal: Test if cross-modal projector needs disproportionately high rank

Setup:

Model: LLaVA-7B (or smallest available VLM)
Dataset: VQAv2 validation set (subset: 5k samples for speed)
Baseline: Fixed rank-8 uniform across all components
Total parameter budget: Fixed at rank-8 equivalent (~2M parameters)

Allocation Ratios to Test:

Uniform (1:1:1) - vision:projector:language = 8:8:8 (baseline)
Projector-heavy (1:2:1) - 6:12:6
Projector-heavy (1:4:1) - 4:16:4
Vision-heavy (2:1:1) - 12:6:6
Language-heavy (1:1:2) - 6:6:12
Balanced-high-projector (2:4:1) - 8:16:4

Metrics:

Primary: VQA accuracy
Secondary: Cross-modal alignment quality (image-text retrieval if time permits)
Efficiency: Training time, memory usage

Pilot Scale:

Single seed (for speed)
2 epochs (enough to see trends)
Estimated time: 2h per ratio × 6 ratios = 12 GPU-hours
Within budget: 12h < PILOT_MAX_HOURS (2h per idea) × MAX_PILOT_IDEAS (3) = 6h...

⚠️ BUDGET ISSUE: 12h total exceeds 2h per-idea limit. Need to reduce scale.

Revised Pilot Scale:

Test only 4 key ratios: Uniform (1:1:1), Projector-heavy (1:4:1), Vision-heavy (2:1:1), Language-heavy (1:1:2)
1.5h per ratio × 4 ratios = 6 GPU-hours
Within budget: 6h < PILOT_MAX_HOURS (2h) × 3 = 6h ✅

Success Criteria:

POSITIVE: Projector-heavy (1:4:1) improves accuracy by >1% over uniform
WEAK POSITIVE: Projector-heavy shows trend but <1% improvement
NEGATIVE: No clear pattern, or uniform is best

Expected Timeline

Phase 5 (Pilot):

Setup: 30 min
Run 4 experiments: 6 GPU-hours
Analysis: 30 min
Total: ~7 hours wall-clock (if run in parallel on 4 GPUs: ~2 hours)

Phase 6 (Report):

Write IDEA_REPORT.md: 30 min
Total: 30 min

Overall: ~2.5 hours wall-clock if GPUs available

Contingency Plan

If no GPU available:

Skip Phase 5 (pilots)
Mark all ideas as "needs manual pilot validation"
Proceed directly to Phase 6 with paper-only validation
Flag CMBA as "highest priority for future pilot"

If pilot shows NEGATIVE result:

Document the negative finding (still publishable as "uniform allocation is optimal")
Fall back to Idea 5 (MSLR) for next iteration
Update hypothesis: maybe projector doesn't need high rank, but modality-specific LR matters

Next Step

Proceed to Phase 5: Run CMBA pilot experiment (4 allocation ratios, 6 GPU-hours)