重构项目结构并更新README.md

1. 重构目录结构:
   - 创建src/visualization模块用于存放可视化相关功能
   - 移动script/visualize_csv_comparison.py到src/visualization/comparison.py
   - 创建src/visualization/__init__.py导出主要函数
   - 整理script目录,按功能分类存放脚本文件

2. 更新README.md:
   - 添加CSV文件比较可视化部分
   - 提供Python API和命令行使用方法说明
   - 描述功能特点和使用示例

3. 更新模块引用:
   - 修正comparison.py中的模块引用路径
   - 更新命令行帮助信息中的使用示例
This commit is contained in:
2025-10-23 17:55:36 +08:00
parent 9f0a0fbcdc
commit bbf1746046
7 changed files with 358 additions and 0 deletions

View File

@@ -0,0 +1,70 @@
#!/usr/bin/env python3
"""Merge split CSVs into a single file with split source labels."""
import argparse
from pathlib import Path
import pandas as pd
# Mapping of split CSV filenames to numeric labels for the source column
SPLIT_LABELS = {
"split_test.csv": 1,
"split_train.csv": 2,
"split_val.csv": 3,
}
DEFAULT_COLUMN_NAME = "split_source"
def parse_args() -> argparse.Namespace:
repo_root = Path(__file__).resolve().parent.parent
parser = argparse.ArgumentParser(
description=
"Combine split_*.csv files from splits_v2 and label their origin with integers."
)
parser.add_argument(
"--input-dir",
type=Path,
default=repo_root / "splits_v2",
help="Directory containing split_*.csv files (default: %(default)s)",
)
parser.add_argument(
"--output",
type=Path,
default=repo_root / "data" / "merged_splits.csv",
help="Destination CSV path (default: %(default)s)",
)
parser.add_argument(
"--column-name",
default=DEFAULT_COLUMN_NAME,
help="Name for the source column (default: %(default)s)",
)
return parser.parse_args()
def main() -> None:
args = parse_args()
if not args.input_dir.is_dir():
raise SystemExit(f"Input directory not found: {args.input_dir}")
frames = []
for filename, label in SPLIT_LABELS.items():
csv_path = args.input_dir / filename
if not csv_path.is_file():
raise SystemExit(f"Missing expected split file: {csv_path}")
df = pd.read_csv(csv_path)
df[args.column_name] = label
frames.append(df)
if not frames:
raise SystemExit("No split CSV files were loaded.")
merged = pd.concat(frames, ignore_index=True)
args.output.parent.mkdir(parents=True, exist_ok=True)
merged.to_csv(args.output, index=False)
print(f"Merged {len(frames)} files with {len(merged)} rows into {args.output}")
if __name__ == "__main__":
main()