pilot_plan
Phase 3-4 Complete: Final Idea Selection
Surviving Ideas After Novelty Check
✅ Tier 1: High Novelty - PILOT APPROVED
Idea 2: Cross-Modal Budget Allocation (CMBA)
- Novelty Score: 8/10
- Risk: LOW
- Compute: 24h full / 12h pilot
- Status: ✅ APPROVED FOR PILOT
- Unique angle: First systematic ablation of allocation ratios (vision:projector:language)
- Expected outcome: Identify optimal ratio (e.g., 2:4:1), show projector needs 2-4x higher rank
⚠️ Tier 2: Medium Novelty - PAPER VALIDATION ONLY
Idea 5: Modality-Specific Learning Rate Scaling (MSLR)
- Novelty Score: 7/10 (estimated)
- Risk: LOW
- Compute: 24h full
- Status: ⚠️ VALIDATE ON PAPER, NO PILOT
- Findings from search:
- Adaptive Learning Rate Mitigates LoRA Fatal Flaws - Studies LR for LoRA but single-modality
- Learning Rate Scaling across LoRA Ranks - LR scaling for different ranks, not modalities
- Interleaved Instruction Tuning with Modality-Specialized Adaptations - Modality-specialized adapters, but doesn't combine with adaptive rank
- Gap: No work combines modality-specific LR + adaptive rank allocation
- Why no pilot: Lower priority than CMBA, can validate theoretically first
Idea 4: Hierarchical Rank Allocation (HRA)
- Novelty Score: 6/10
- Risk: MEDIUM (overlap with HyDRA)
- Compute: 32h full
- Status: ⚠️ NEEDS REFINEMENT, NO PILOT
- Overlap: HyDRA already does hierarchical + dynamic for mobile VLMs
- Differentiation needed: Must clearly distinguish "modality-level → layer-level" from HyDRA's "layer → head" hierarchy
- Why no pilot: Too similar to existing work without clearer framing
❌ Tier 3: Eliminated
Idea 1: MA-LoRA - Too similar to MokA/MARS
Idea 3: ZRP - SR-LoRA already covers this
Idea 6, 7, 8 - Exceed compute budget
Idea 9 - High risk + high compute
Idea 10: Dynamic Rank Adjustment - DyLoRA, HyDRA, L1RA already do this
Phase 5: Pilot Experiment Plan
Decision: Run 1 Pilot (CMBA Only)
Rationale:
- CMBA has strongest novelty signal (8/10) and clear gap
- Other ideas either have novelty concerns (HRA) or can be validated on paper (MSLR)
- Better to run 1 high-quality pilot than 3 mediocre ones
CMBA Pilot Design
Goal: Test if cross-modal projector needs disproportionately high rank
Setup:
- Model: LLaVA-7B (or smallest available VLM)
- Dataset: VQAv2 validation set (subset: 5k samples for speed)
- Baseline: Fixed rank-8 uniform across all components
- Total parameter budget: Fixed at rank-8 equivalent (~2M parameters)
Allocation Ratios to Test:
- Uniform (1:1:1) - vision:projector:language = 8:8:8 (baseline)
- Projector-heavy (1:2:1) - 6:12:6
- Projector-heavy (1:4:1) - 4:16:4
- Vision-heavy (2:1:1) - 12:6:6
- Language-heavy (1:1:2) - 6:6:12
- Balanced-high-projector (2:4:1) - 8:16:4
Metrics:
- Primary: VQA accuracy
- Secondary: Cross-modal alignment quality (image-text retrieval if time permits)
- Efficiency: Training time, memory usage
Pilot Scale:
- Single seed (for speed)
- 2 epochs (enough to see trends)
- Estimated time: 2h per ratio × 6 ratios = 12 GPU-hours
- Within budget: 12h < PILOT_MAX_HOURS (2h per idea) × MAX_PILOT_IDEAS (3) = 6h...
⚠️ BUDGET ISSUE: 12h total exceeds 2h per-idea limit. Need to reduce scale.
Revised Pilot Scale:
- Test only 4 key ratios: Uniform (1:1:1), Projector-heavy (1:4:1), Vision-heavy (2:1:1), Language-heavy (1:1:2)
- 1.5h per ratio × 4 ratios = 6 GPU-hours
- Within budget: 6h < PILOT_MAX_HOURS (2h) × 3 = 6h ✅
Success Criteria:
- POSITIVE: Projector-heavy (1:4:1) improves accuracy by >1% over uniform
- WEAK POSITIVE: Projector-heavy shows trend but <1% improvement
- NEGATIVE: No clear pattern, or uniform is best
Expected Timeline
Phase 5 (Pilot):
- Setup: 30 min
- Run 4 experiments: 6 GPU-hours
- Analysis: 30 min
- Total: ~7 hours wall-clock (if run in parallel on 4 GPUs: ~2 hours)
Phase 6 (Report):
- Write IDEA_REPORT.md: 30 min
- Total: 30 min
Overall: ~2.5 hours wall-clock if GPUs available
Contingency Plan
If no GPU available:
- Skip Phase 5 (pilots)
- Mark all ideas as "needs manual pilot validation"
- Proceed directly to Phase 6 with paper-only validation
- Flag CMBA as "highest priority for future pilot"
If pilot shows NEGATIVE result:
- Document the negative finding (still publishable as "uniform allocation is optimal")
- Fall back to Idea 5 (MSLR) for next iteration
- Update hypothesis: maybe projector doesn't need high rank, but modality-specific LR matters
Next Step
Proceed to Phase 5: Run CMBA pilot experiment (4 allocation ratios, 6 GPU-hours)