filtering_analysis
Phase 3: First-Pass Filtering Analysis
Novelty Quick-Check Results
⚠️ PARTIAL OVERLAP - Need Differentiation
Idea 1: Modality-Aware Adaptive LoRA (MA-LoRA)
- Closest work:
- Multimodal Low-Rank Adaptation (MokA) - Already does modality-aware parameter allocation
- Hierarchical and Dynamic Rank Adaptation for Mobile VLM - Dynamic rank for multimodal
- MARS: Multimodal Adaptive Rank Search - Adaptive rank search for multimodal
- Differentiation needed: These papers already combine adaptive rank + multimodal. Need stronger angle.
- Status: ⚠️ NEEDS REFINEMENT
Idea 2: Cross-Modal Budget Allocation (CMBA)
- Closest work:
- Towards Efficient Visual-Language Alignment of Q-Former - Uses AdaLoRA on Q-Former
- Cross-Modal Low-rank Adaptation - Cross-modal LoRA
- Gap: No systematic study of budget ratios across modalities. Existing work applies uniform methods.
- Status: ✅ NOVEL - diagnostic angle is unique
Idea 3: Zero-Shot Rank Predictor (ZRP)
- Closest work:
- Model Prior-Guided Rank Allocation (SR-LoRA) - Uses stable rank (intrinsic dimensionality)
- Geometric Adaptive Ranks - Geometry-based rank selection
- Gap: SR-LoRA uses stable rank but still requires training. True zero-shot prediction from pre-trained stats is unexplored.
- Status: ⚠️ PARTIAL OVERLAP - SR-LoRA is very close
✅ NOVEL - Clear Differentiation
Idea 4: Hierarchical Rank Allocation (HRA)
- No direct match found. Existing work does layer-wise or modality-wise, but not hierarchical (coarse→fine).
- Status: ✅ NOVEL
Idea 5: Modality-Specific Learning Rate Scaling (MSLR)
- No work combines adaptive rank + modality-specific LR.
- Status: ✅ NOVEL
Idea 10: Dynamic Rank Adjustment During Training
- Existing work uses fixed rank or post-hoc pruning, not gradual decay during training.
- Status: ✅ NOVEL
❌ HIGH RISK / HIGH COMPUTE
Idea 6: Task-Conditioned Rank Allocation (TCRA)
- Interesting but requires 72 GPU-hours for 3 tasks. Too expensive for pilot.
- Status: ❌ ELIMINATE (compute budget)
Idea 7: Gradient-Free Rank Search via Evolutionary Algorithm
- 80 GPU-hours for search. Exceeds MAX_TOTAL_GPU_HOURS=8.
- Status: ❌ ELIMINATE (compute budget)
Idea 8: Cross-Architecture Rank Transfer
- 96 GPU-hours across 3 architectures. Too expensive.
- Status: ❌ ELIMINATE (compute budget)
Idea 9: Information Bottleneck-Guided Rank Allocation
- High risk (theory may not hold) + 48 GPU-hours.
- Status: ❌ ELIMINATE (risk + compute)
Feasibility Check
| Idea | Compute | Data | Implementation | Verdict |
|---|---|---|---|---|
| 1. MA-LoRA | 40h | VQAv2 ✅ | Medium | ⚠️ Needs differentiation |
| 2. CMBA | 24h | VQAv2 ✅ | Easy | ✅ PASS |
| 3. ZRP | 60h | VQAv2 ✅ | Hard | ⚠️ Overlaps SR-LoRA |
| 4. HRA | 32h | VQAv2 ✅ | Medium | ✅ PASS |
| 5. MSLR | 24h | VQAv2 ✅ | Easy | ✅ PASS |
| 6. TCRA | 72h | 3 datasets | Hard | ❌ Too expensive |
| 7. Evolutionary | 80h | 2 datasets | Medium | ❌ Too expensive |
| 8. Transfer | 96h | 3 models | Hard | ❌ Too expensive |
| 9. IB-Guided | 48h | VQAv2 ✅ | Very Hard | ❌ High risk + cost |
| 10. Dynamic | 24h | VQAv2 ✅ | Easy | ✅ PASS |
Impact Estimation
High Impact (clear "so what"):
- Idea 2 (CMBA): Reveals where parameter budget should go in multimodal models. Actionable for practitioners.
- Idea 4 (HRA): 2x faster than AdaLoRA while maintaining performance. Clear efficiency win.
- Idea 10 (Dynamic): 40-50% parameter savings with same accuracy. Strong practical value.
Medium Impact:
- Idea 5 (MSLR): Faster convergence is useful but not groundbreaking.
Unclear Impact:
- Idea 1 (MA-LoRA): Too similar to existing work (MokA, MARS). Needs stronger differentiation.
- Idea 3 (ZRP): SR-LoRA already does this partially. Incremental improvement at best.
Surviving Ideas (6 → 4)
✅ Top Tier (pilot these)
- Idea 2: Cross-Modal Budget Allocation (CMBA) - LOW risk, HIGH impact, NOVEL
- Idea 4: Hierarchical Rank Allocation (HRA) - MEDIUM risk, HIGH impact, NOVEL
- Idea 10: Dynamic Rank Adjustment (Dynamic) - LOW risk, HIGH impact, NOVEL
⚠️ Second Tier (validate on paper, pilot if budget allows)
- Idea 5: Modality-Specific Learning Rate Scaling (MSLR) - LOW risk, MEDIUM impact, NOVEL
❌ Eliminated (6 ideas)
- Idea 1: Too similar to MokA/MARS - needs stronger angle
- Idea 3: SR-LoRA already covers this - incremental at best
- Idea 6, 7, 8: Exceed compute budget (72-96 GPU-hours)
- Idea 9: High risk + high compute (48h)
Recommendation for Phase 4
Pilot these 3 ideas in parallel (total: 24+32+24 = 80h, but pilots are scaled down):
- CMBA - 6 ratio ablations × 2h = 12h pilot
- HRA - 3 allocation strategies × 3h = 9h pilot
- Dynamic - 3 rank schedules × 2h = 6h pilot
Total pilot budget: ~27 GPU-hours (well under MAX_TOTAL_GPU_HOURS=8 per idea)
If pilots show positive signal, proceed to deep validation (Phase 4).
Key Findings from Literature
Recent relevant work (2024-2025):
- MokA: Modality-aware LoRA with shared/specific parameters
- MARS: Adaptive rank search for multimodal
- SR-LoRA: Stable rank-guided allocation
- Hierarchical Dynamic Rank: Dynamic rank for mobile VLMs
- Q-Former PEFT: AdaLoRA on cross-modal projector
Structural gap confirmed: No work systematically studies budget allocation ratios across modalities (vision:projector:language). This is Idea 2's unique angle.