diff --git a/1d-qsar/cuda/README.md b/1d-qsar/cuda/README.md
new file mode 100644
index 0000000..7e986e1
--- /dev/null
+++ b/1d-qsar/cuda/README.md
@@ -0,0 +1,31 @@
+## 优化 GPU 版本的方法
+
+[安装方法](https://docs.rapids.ai/install/?_gl=1*1q2wl26*_ga*MjA2OTk3MDkzNS4xNzQwODI1MzUx*_ga_RKXFW6CM42*MTc0MDgyNTM1MC4xLjAuMTc0MDgyNTM1MC42MC4wLjA.#selector)
+
+
+```shell
+micromamba create -n rapids_env -y
+micromamba activate rapids_env
+
+micromamba create -n rapids-25.02 -c rapidsai -c conda-forge -c nvidia  \
+    rapids=25.02 python=3.12 'cuda-version>=12.0,<=12.8'
+
+d
+micromamba create -n rapids-25.02 -c rapidsai -c conda-forge -c nvidia  \
+    rapids=25.02 python=3.12 'cuda-version>=12.0,<=12.8' \
+    'pytorch=*=*cuda*' jupyterlab
+
+```
+
+### 方法 1：使用 RAPIDS cuML 替代 scikit-learn
+
+RAPIDS cuML 提供了与 scikit-learn 兼容的 GPU 版本的 RandomForestRegressor，可以大幅提升计算速度：
+
+
+## 方法 2：使用 XGBoost 进行特征选择
+
+XGBoost 支持 GPU 训练，我们可以使用 XGBRegressor 替换 RandomForestRegressor 进行特征选择
+
+## 方法 3：使用 Optuna 进行特征选择优化
+
+Optuna 是一个自动超参数优化库，可以在 GPU 上搜索最佳特征子集：
\ No newline at end of file
diff --git a/1d-qsar/cuda/RFE_cuml_permutation.log b/1d-qsar/cuda/RFE_cuml_permutation.log
new file mode 100644
index 0000000..c3a08ed
--- /dev/null
+++ b/1d-qsar/cuda/RFE_cuml_permutation.log
@@ -0,0 +1,399 @@
+(rapids-25.02) root@DESK4090:~/project/qsar/1d-qsar/cuda# python RFE_cuml_permutation.py 
+训练样本数: 81, 特征数量: 156
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:363: UserWarning: For reproducible results in Random Forest Classifier or for almost reproducible results in Random Forest Regressor, n_streams=1 is recommended. If n_streams is > 1, results may vary due to stream/thread timing differences, even when random_state is set
+  return init_func(self, *args, **kwargs)
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188: UserWarning: The number of bins, `n_bins` is greater than the number of samples used for training. Changing `n_bins` to number of training samples.
+  ret = func(*args, **kwargs)
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188: UserWarning: To use pickling first train using float32 data to fit the estimator
+  ret = func(*args, **kwargs)
+n_estimators: 50, CV MSE: 5.2602
+n_estimators: 100, CV MSE: 5.0974
+n_estimators: 150, CV MSE: 5.1422
+n_estimators: 200, CV MSE: 5.0988
+n_estimators: 250, CV MSE: 4.9330
+n_estimators: 300, CV MSE: 4.9259
+n_estimators: 350, CV MSE: 4.9253
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 19981 (\N{CJK UNIFIED IDEOGRAPH-4E0D}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 21516 (\N{CJK UNIFIED IDEOGRAPH-540C}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 26641 (\N{CJK UNIFIED IDEOGRAPH-6811}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 25968 (\N{CJK UNIFIED IDEOGRAPH-6570}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 37327 (\N{CJK UNIFIED IDEOGRAPH-91CF}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 19979 (\N{CJK UNIFIED IDEOGRAPH-4E0B}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 30340 (\N{CJK UNIFIED IDEOGRAPH-7684}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 20132 (\N{CJK UNIFIED IDEOGRAPH-4EA4}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 21449 (\N{CJK UNIFIED IDEOGRAPH-53C9}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 39564 (\N{CJK UNIFIED IDEOGRAPH-9A8C}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:143: UserWarning: Glyph 35777 (\N{CJK UNIFIED IDEOGRAPH-8BC1}) missing from font(s) DejaVu Sans.
+  plt.savefig("tree_vs_cv_mse.png", dpi=300)
+最佳随机森林树数量确定为: 350
+当前特征数: 156, CV MSE: 4.9253
+剔除特征索引: 133，置换重要性: -0.0004
+当前特征数: 155, CV MSE: 4.9970
+剔除特征索引: 124，置换重要性: -0.0009
+当前特征数: 154, CV MSE: 4.7611
+剔除特征索引: 152，置换重要性: -0.0006
+当前特征数: 153, CV MSE: 4.7928
+剔除特征索引: 47，置换重要性: -0.0001
+当前特征数: 152, CV MSE: 4.7032
+剔除特征索引: 107，置换重要性: -0.0010
+当前特征数: 151, CV MSE: 4.7571
+剔除特征索引: 155，置换重要性: -0.0000
+当前特征数: 150, CV MSE: 4.7501
+剔除特征索引: 11，置换重要性: -0.0004
+当前特征数: 149, CV MSE: 4.8415
+剔除特征索引: 52，置换重要性: -0.0001
+当前特征数: 148, CV MSE: 4.7807
+剔除特征索引: 143，置换重要性: -0.0001
+当前特征数: 147, CV MSE: 4.7937
+剔除特征索引: 12，置换重要性: -0.0008
+当前特征数: 146, CV MSE: 4.7589
+剔除特征索引: 2，置换重要性: -0.0007
+当前特征数: 145, CV MSE: 4.8444
+剔除特征索引: 125，置换重要性: -0.0000
+当前特征数: 144, CV MSE: 4.8518
+剔除特征索引: 119，置换重要性: -0.0019
+当前特征数: 143, CV MSE: 4.6953
+剔除特征索引: 67，置换重要性: -0.0000
+当前特征数: 142, CV MSE: 4.7689
+剔除特征索引: 77，置换重要性: -0.0001
+当前特征数: 141, CV MSE: 4.7446
+剔除特征索引: 55，置换重要性: -0.0001
+当前特征数: 140, CV MSE: 4.8110
+剔除特征索引: 66，置换重要性: -0.0011
+当前特征数: 139, CV MSE: 4.7815
+剔除特征索引: 0，置换重要性: 0.0000
+当前特征数: 138, CV MSE: 4.8477
+剔除特征索引: 148，置换重要性: -0.0003
+当前特征数: 137, CV MSE: 4.7395
+剔除特征索引: 48，置换重要性: -0.0000
+当前特征数: 136, CV MSE: 4.6840
+剔除特征索引: 150，置换重要性: -0.0000
+当前特征数: 135, CV MSE: 4.8117
+剔除特征索引: 60，置换重要性: -0.0001
+当前特征数: 134, CV MSE: 4.7988
+剔除特征索引: 149，置换重要性: -0.0000
+当前特征数: 133, CV MSE: 4.7597
+剔除特征索引: 76，置换重要性: -0.0000
+当前特征数: 132, CV MSE: 4.7059
+剔除特征索引: 69，置换重要性: -0.0000
+当前特征数: 131, CV MSE: 4.7498
+剔除特征索引: 51，置换重要性: -0.0000
+当前特征数: 130, CV MSE: 4.7434
+剔除特征索引: 142，置换重要性: -0.0000
+当前特征数: 129, CV MSE: 4.7211
+剔除特征索引: 105，置换重要性: -0.0001
+当前特征数: 128, CV MSE: 4.8107
+剔除特征索引: 53，置换重要性: -0.0000
+当前特征数: 127, CV MSE: 4.7079
+剔除特征索引: 86，置换重要性: -0.0009
+当前特征数: 126, CV MSE: 4.7144
+剔除特征索引: 136，置换重要性: -0.0000
+当前特征数: 125, CV MSE: 4.7590
+剔除特征索引: 113，置换重要性: -0.0013
+当前特征数: 124, CV MSE: 4.8185
+剔除特征索引: 146，置换重要性: -0.0000
+当前特征数: 123, CV MSE: 4.7051
+剔除特征索引: 153，置换重要性: -0.0000
+当前特征数: 122, CV MSE: 4.6048
+剔除特征索引: 138，置换重要性: -0.0006
+当前特征数: 121, CV MSE: 4.5805
+剔除特征索引: 147，置换重要性: -0.0000
+当前特征数: 120, CV MSE: 4.8172
+剔除特征索引: 62，置换重要性: -0.0001
+当前特征数: 119, CV MSE: 4.8393
+剔除特征索引: 127，置换重要性: -0.0002
+当前特征数: 118, CV MSE: 4.7780
+剔除特征索引: 1，置换重要性: 0.0000
+当前特征数: 117, CV MSE: 4.7227
+剔除特征索引: 3，置换重要性: 0.0000
+当前特征数: 116, CV MSE: 4.7864
+剔除特征索引: 4，置换重要性: 0.0000
+当前特征数: 115, CV MSE: 4.6388
+剔除特征索引: 5，置换重要性: 0.0000
+当前特征数: 114, CV MSE: 4.5150
+剔除特征索引: 140，置换重要性: -0.0001
+当前特征数: 113, CV MSE: 4.6324
+剔除特征索引: 6，置换重要性: 0.0000
+当前特征数: 112, CV MSE: 4.7494
+剔除特征索引: 24，置换重要性: -0.0009
+当前特征数: 111, CV MSE: 4.6774
+剔除特征索引: 80，置换重要性: -0.0001
+当前特征数: 110, CV MSE: 4.7463
+剔除特征索引: 70，置换重要性: -0.0000
+当前特征数: 109, CV MSE: 4.7211
+剔除特征索引: 54，置换重要性: -0.0000
+当前特征数: 108, CV MSE: 4.6060
+剔除特征索引: 7，置换重要性: 0.0000
+当前特征数: 107, CV MSE: 4.5989
+剔除特征索引: 88，置换重要性: -0.0001
+当前特征数: 106, CV MSE: 4.5530
+剔除特征索引: 106，置换重要性: -0.0008
+当前特征数: 105, CV MSE: 4.6703
+剔除特征索引: 8，置换重要性: 0.0000
+当前特征数: 104, CV MSE: 4.4274
+剔除特征索引: 145，置换重要性: -0.0000
+当前特征数: 103, CV MSE: 4.7372
+剔除特征索引: 9，置换重要性: 0.0000
+当前特征数: 102, CV MSE: 4.6272
+剔除特征索引: 129，置换重要性: -0.0004
+当前特征数: 101, CV MSE: 4.6820
+剔除特征索引: 10，置换重要性: 0.0000
+当前特征数: 100, CV MSE: 4.5780
+剔除特征索引: 13，置换重要性: -0.0000
+当前特征数: 99, CV MSE: 4.6734
+剔除特征索引: 139，置换重要性: -0.0002
+当前特征数: 98, CV MSE: 4.6922
+剔除特征索引: 50，置换重要性: -0.0000
+当前特征数: 97, CV MSE: 4.6520
+剔除特征索引: 134，置换重要性: -0.0002
+当前特征数: 96, CV MSE: 4.6574
+剔除特征索引: 14，置换重要性: 0.0000
+当前特征数: 95, CV MSE: 4.7120
+剔除特征索引: 15，置换重要性: 0.0000
+当前特征数: 94, CV MSE: 4.6882
+剔除特征索引: 16，置换重要性: 0.0000
+当前特征数: 93, CV MSE: 4.5839
+剔除特征索引: 17，置换重要性: 0.0000
+当前特征数: 92, CV MSE: 4.6231
+剔除特征索引: 18，置换重要性: 0.0000
+当前特征数: 91, CV MSE: 4.6144
+剔除特征索引: 154，置换重要性: 0.0000
+当前特征数: 90, CV MSE: 4.4886
+剔除特征索引: 19，置换重要性: 0.0000
+当前特征数: 89, CV MSE: 4.5828
+剔除特征索引: 20，置换重要性: 0.0000
+当前特征数: 88, CV MSE: 4.4967
+剔除特征索引: 144，置换重要性: -0.0003
+当前特征数: 87, CV MSE: 4.5881
+剔除特征索引: 21，置换重要性: 0.0000
+当前特征数: 86, CV MSE: 4.4972
+剔除特征索引: 22，置换重要性: 0.0000
+当前特征数: 85, CV MSE: 4.5007
+剔除特征索引: 137，置换重要性: -0.0010
+当前特征数: 84, CV MSE: 4.5097
+剔除特征索引: 23，置换重要性: 0.0000
+当前特征数: 83, CV MSE: 4.4847
+剔除特征索引: 25，置换重要性: 0.0000
+当前特征数: 82, CV MSE: 4.6722
+剔除特征索引: 26，置换重要性: 0.0000
+当前特征数: 81, CV MSE: 4.5776
+剔除特征索引: 27，置换重要性: 0.0000
+当前特征数: 80, CV MSE: 4.5790
+剔除特征索引: 28，置换重要性: 0.0000
+当前特征数: 79, CV MSE: 4.5349
+剔除特征索引: 29，置换重要性: 0.0000
+当前特征数: 78, CV MSE: 4.5542
+剔除特征索引: 30，置换重要性: 0.0000
+当前特征数: 77, CV MSE: 4.5299
+剔除特征索引: 31，置换重要性: 0.0000
+当前特征数: 76, CV MSE: 4.6087
+剔除特征索引: 32，置换重要性: 0.0000
+当前特征数: 75, CV MSE: 4.6009
+剔除特征索引: 33，置换重要性: 0.0000
+当前特征数: 74, CV MSE: 4.4684
+剔除特征索引: 34，置换重要性: 0.0000
+当前特征数: 73, CV MSE: 4.5094
+剔除特征索引: 35，置换重要性: 0.0000
+当前特征数: 72, CV MSE: 4.4700
+剔除特征索引: 49，置换重要性: -0.0001
+当前特征数: 71, CV MSE: 4.5176
+剔除特征索引: 36，置换重要性: 0.0000
+当前特征数: 70, CV MSE: 4.4335
+剔除特征索引: 37，置换重要性: -0.0000
+当前特征数: 69, CV MSE: 4.4646
+剔除特征索引: 38，置换重要性: 0.0000
+当前特征数: 68, CV MSE: 4.4792
+剔除特征索引: 39，置换重要性: -0.0000
+当前特征数: 67, CV MSE: 4.4398
+剔除特征索引: 40，置换重要性: 0.0000
+当前特征数: 66, CV MSE: 4.3681
+剔除特征索引: 41，置换重要性: 0.0001
+当前特征数: 65, CV MSE: 4.3967
+剔除特征索引: 42，置换重要性: 0.0000
+当前特征数: 64, CV MSE: 4.4340
+剔除特征索引: 43，置换重要性: 0.0000
+当前特征数: 63, CV MSE: 4.4324
+剔除特征索引: 44，置换重要性: 0.0000
+当前特征数: 62, CV MSE: 4.5365
+剔除特征索引: 45，置换重要性: 0.0000
+当前特征数: 61, CV MSE: 4.5531
+剔除特征索引: 46，置换重要性: 0.0000
+当前特征数: 60, CV MSE: 4.4241
+剔除特征索引: 56，置换重要性: 0.0000
+当前特征数: 59, CV MSE: 4.4180
+剔除特征索引: 57，置换重要性: -0.0000
+当前特征数: 58, CV MSE: 4.3972
+剔除特征索引: 58，置换重要性: 0.0000
+当前特征数: 57, CV MSE: 4.2306
+剔除特征索引: 59，置换重要性: 0.0000
+当前特征数: 56, CV MSE: 4.2263
+剔除特征索引: 61，置换重要性: 0.0000
+当前特征数: 55, CV MSE: 4.4408
+剔除特征索引: 63，置换重要性: 0.0002
+当前特征数: 54, CV MSE: 4.4805
+剔除特征索引: 64，置换重要性: 0.0003
+当前特征数: 53, CV MSE: 4.4570
+剔除特征索引: 65，置换重要性: 0.0000
+当前特征数: 52, CV MSE: 4.4523
+剔除特征索引: 68，置换重要性: -0.0000
+当前特征数: 51, CV MSE: 4.4382
+剔除特征索引: 71，置换重要性: 0.0000
+当前特征数: 50, CV MSE: 4.4065
+剔除特征索引: 72，置换重要性: 0.0000
+当前特征数: 49, CV MSE: 4.3169
+剔除特征索引: 73，置换重要性: 0.0000
+当前特征数: 48, CV MSE: 4.2661
+剔除特征索引: 74，置换重要性: 0.0000
+当前特征数: 47, CV MSE: 4.3875
+剔除特征索引: 75，置换重要性: 0.0004
+当前特征数: 46, CV MSE: 4.3418
+剔除特征索引: 78，置换重要性: -0.0000
+当前特征数: 45, CV MSE: 4.4599
+剔除特征索引: 79，置换重要性: 0.0000
+当前特征数: 44, CV MSE: 4.5502
+剔除特征索引: 81，置换重要性: 0.0000
+当前特征数: 43, CV MSE: 4.4299
+剔除特征索引: 135，置换重要性: 0.0025
+当前特征数: 42, CV MSE: 4.4397
+剔除特征索引: 82，置换重要性: 0.0000
+当前特征数: 41, CV MSE: 4.4255
+剔除特征索引: 83，置换重要性: 0.0003
+当前特征数: 40, CV MSE: 4.4042
+剔除特征索引: 84，置换重要性: 0.0000
+当前特征数: 39, CV MSE: 4.4017
+剔除特征索引: 85，置换重要性: 0.0000
+当前特征数: 38, CV MSE: 4.3463
+剔除特征索引: 87，置换重要性: 0.0003
+当前特征数: 37, CV MSE: 4.5079
+剔除特征索引: 89，置换重要性: -0.0001
+当前特征数: 36, CV MSE: 4.4404
+剔除特征索引: 90，置换重要性: 0.0002
+当前特征数: 35, CV MSE: 4.4802
+剔除特征索引: 91，置换重要性: -0.0000
+当前特征数: 34, CV MSE: 4.5477
+剔除特征索引: 92，置换重要性: 0.0000
+当前特征数: 33, CV MSE: 4.5129
+剔除特征索引: 93，置换重要性: 0.0000
+当前特征数: 32, CV MSE: 4.4643
+剔除特征索引: 94，置换重要性: 0.0004
+当前特征数: 31, CV MSE: 4.6173
+剔除特征索引: 95，置换重要性: 0.0000
+当前特征数: 30, CV MSE: 4.5979
+剔除特征索引: 96，置换重要性: 0.0002
+当前特征数: 29, CV MSE: 4.4120
+剔除特征索引: 97，置换重要性: 0.0000
+当前特征数: 28, CV MSE: 4.4958
+剔除特征索引: 98，置换重要性: 0.0004
+当前特征数: 27, CV MSE: 4.5448
+剔除特征索引: 99，置换重要性: 0.0002
+当前特征数: 26, CV MSE: 4.3483
+剔除特征索引: 100，置换重要性: 0.0014
+当前特征数: 25, CV MSE: 4.3617
+剔除特征索引: 101，置换重要性: 0.0001
+当前特征数: 24, CV MSE: 4.4947
+剔除特征索引: 102，置换重要性: 0.0017
+当前特征数: 23, CV MSE: 4.2328
+剔除特征索引: 103，置换重要性: 0.0022
+当前特征数: 22, CV MSE: 4.3158
+剔除特征索引: 104，置换重要性: 0.0000
+当前特征数: 21, CV MSE: 4.3848
+剔除特征索引: 108，置换重要性: 0.0000
+当前特征数: 20, CV MSE: 4.3861
+剔除特征索引: 109，置换重要性: 0.0009
+当前特征数: 19, CV MSE: 4.1140
+剔除特征索引: 110，置换重要性: 0.0002
+当前特征数: 18, CV MSE: 4.0750
+剔除特征索引: 111，置换重要性: 0.0000
+当前特征数: 17, CV MSE: 3.9148
+剔除特征索引: 112，置换重要性: 0.0004
+当前特征数: 16, CV MSE: 4.0441
+剔除特征索引: 114，置换重要性: 0.0016
+当前特征数: 15, CV MSE: 3.9886
+剔除特征索引: 115，置换重要性: 0.0000
+当前特征数: 14, CV MSE: 4.1169
+剔除特征索引: 116，置换重要性: 0.0000
+当前特征数: 13, CV MSE: 4.1846
+剔除特征索引: 117，置换重要性: 0.0000
+当前特征数: 12, CV MSE: 4.2129
+剔除特征索引: 118，置换重要性: 0.0020
+当前特征数: 11, CV MSE: 4.2006
+剔除特征索引: 120，置换重要性: 0.0024
+当前特征数: 10, CV MSE: 3.9221
+剔除特征索引: 121，置换重要性: 0.0032
+当前特征数: 9, CV MSE: 3.9000
+剔除特征索引: 122，置换重要性: 0.0015
+当前特征数: 8, CV MSE: 4.1202
+剔除特征索引: 123，置换重要性: 0.0020
+当前特征数: 7, CV MSE: 4.0491
+剔除特征索引: 126，置换重要性: 0.0002
+当前特征数: 6, CV MSE: 3.8452
+剔除特征索引: 128，置换重要性: 0.0003
+当前特征数: 5, CV MSE: 3.8135
+剔除特征索引: 131，置换重要性: 0.0013
+当前特征数: 4, CV MSE: 3.8181
+剔除特征索引: 130，置换重要性: 0.0106
+当前特征数: 3, CV MSE: 3.9118
+剔除特征索引: 132，置换重要性: -0.0000
+当前特征数: 2, CV MSE: 3.9524
+剔除特征索引: 141，置换重要性: 0.0000
+当前特征数: 1, CV MSE: 3.9524
+
+手动RFECV选择了 5 个特征，CV MSE: 3.8135
+选定特征名称： ['fr_Ndealkylation2', 'fr_aldehyde', 'fr_alkyl_halide', 'fr_halogen', 'fr_piperdine']
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 34987 (\N{CJK UNIFIED IDEOGRAPH-88AB}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 21076 (\N{CJK UNIFIED IDEOGRAPH-5254}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 38500 (\N{CJK UNIFIED IDEOGRAPH-9664}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 29305 (\N{CJK UNIFIED IDEOGRAPH-7279}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 24449 (\N{CJK UNIFIED IDEOGRAPH-5F81}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 30340 (\N{CJK UNIFIED IDEOGRAPH-7684}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 32622 (\N{CJK UNIFIED IDEOGRAPH-7F6E}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 25442 (\N{CJK UNIFIED IDEOGRAPH-6362}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 37325 (\N{CJK UNIFIED IDEOGRAPH-91CD}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 35201 (\N{CJK UNIFIED IDEOGRAPH-8981}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 24615 (\N{CJK UNIFIED IDEOGRAPH-6027}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 36807 (\N{CJK UNIFIED IDEOGRAPH-8FC7}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 31243 (\N{CJK UNIFIED IDEOGRAPH-7A0B}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 20013 (\N{CJK UNIFIED IDEOGRAPH-4E2D}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 21464 (\N{CJK UNIFIED IDEOGRAPH-53D8}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 21270 (\N{CJK UNIFIED IDEOGRAPH-5316}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 36845 (\N{CJK UNIFIED IDEOGRAPH-8FED}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 20195 (\N{CJK UNIFIED IDEOGRAPH-4EE3}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 27425 (\N{CJK UNIFIED IDEOGRAPH-6B21}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+/root/project/qsar/1d-qsar/cuda/RFE_cuml_permutation.py:208: UserWarning: Glyph 25968 (\N{CJK UNIFIED IDEOGRAPH-6570}) missing from font(s) DejaVu Sans.
+  plt.savefig("rfecv_perm_importance.png", dpi=300)
+
+最终模型（最佳特征、n_estimators=350）CV MSE: 3.8135
+
+
+在 RFECV 的过程中，并不是直接选择“置换重要性最高”的特征，而是逐步剔除那些对模型贡献较小（即剔除后 CV MSE 变化最小）的特征。最终留下的这 5 个特征，正是当它们保留时能让交叉验证的均方误差最低的那一组特征组合。
+
+换句话说，这 5 个特征并不是单独“置换重要性很大”，而是删除其他特征后，模型的预测性能下降得比较多。如果删除其中任一特征，CV MSE 会升高，因此保留它们可以达到最佳性能。所以最终只剩下这 5 个特征，是因为它们在整体组合上对预测任务最为关键。
\ No newline at end of file
diff --git a/1d-qsar/cuda/RFE_cuml_permutation.py b/1d-qsar/cuda/RFE_cuml_permutation.py
new file mode 100644
index 0000000..3b09858
--- /dev/null
+++ b/1d-qsar/cuda/RFE_cuml_permutation.py
@@ -0,0 +1,217 @@
+#!/usr/bin/env python
+# -*- encoding: utf-8 -*-
+'''
+@file: RFE_cuml_permutation.py
+@Description: 使用GPU cuML实现特征选择，手动实现RFECV，并通过置换重要性计算来剔除最不重要的特征，
+             同时通过交叉验证确定合理的随机森林树数量（动态扩展直到CV MSE收敛），
+             并将结果保存为图像，最后使用最佳树数进行训练。
+@Date: 2025/03/01
+@Author: lyzeng（修改版）
+'''
+
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+
+# ---------------------------
+# 数据加载与特征构造
+# ---------------------------
+data = pd.read_csv("../data_smi.csv", index_col="Entry ID")
+target_data = data[["SMILES", "S.aureus ATCC25923"]]
+target_data.columns = ["SMILES", "TARGET"]
+
+from rdkit import Chem
+from rdkit.Chem import Descriptors
+
+def getMolDescriptors(smiles: str, missingVal=None):
+    mol = Chem.MolFromSmiles(smiles)
+    res = {}
+    for nm, fn in Descriptors._descList:
+        try:
+            val = fn(mol)
+        except Exception as e:
+            val = missingVal
+        res[nm] = val
+    return res
+
+# 构建特征矩阵与目标变量
+X_df = pd.DataFrame([getMolDescriptors(smi) for smi in target_data['SMILES']])
+y = target_data['TARGET'].values
+
+# 剔除无效特征（取值唯一的列）
+invalid_features = [col for col in X_df.columns if X_df[col].nunique() <= 1]
+X_df.drop(columns=invalid_features, inplace=True)
+X = X_df.values
+
+print(f"训练样本数: {len(y)}, 特征数量: {X.shape[1]}")
+
+# ---------------------------
+# 转换为GPU数据格式（cupy数组）
+# ---------------------------
+import cupy as cp
+X_gpu = cp.asarray(X)
+y_gpu = cp.asarray(y)
+
+# ---------------------------
+# 定义辅助函数：计算模型的MSE
+# ---------------------------
+def model_mse(model, X_data, y_data):
+    y_pred = model.predict(X_data)
+    mse = cp.mean((y_data - y_pred) ** 2)
+    return mse.get()
+
+# ---------------------------
+# 定义置换重要性计算函数（在GPU上）
+# ---------------------------
+def permutation_importance_gpu(model, X_data, y_data, n_repeats=3, random_state=0):
+    """
+    对于给定的模型，计算每个特征的置换重要性：
+      重要性 = (打乱特征后的MSE均值) - (原始MSE)
+    随机打乱每个特征的值，并观察模型性能的下降程度来衡量特征的重要性。
+    """
+    X_cpu = cp.asnumpy(X_data)  # 将数据复制到CPU进行打乱
+    baseline = model_mse(model, X_data, y_data)
+    importances = np.zeros(X_cpu.shape[1])
+    rng = np.random.RandomState(random_state)
+    for j in range(X_cpu.shape[1]):
+        scores = []
+        for _ in range(n_repeats):
+            X_permuted = X_cpu.copy()
+            rng.shuffle(X_permuted[:, j])
+            X_permuted_gpu = cp.asarray(X_permuted)
+            score = model_mse(model, X_permuted_gpu, y_data)
+            scores.append(score)
+        importances[j] = np.mean(scores) - baseline
+    return importances
+
+# ---------------------------
+# 修改交叉验证函数：允许传入树的数量
+# ---------------------------
+from cuml.ensemble import RandomForestRegressor as cuRF
+from sklearn.model_selection import KFold
+
+def cross_val_score_gpu(X_data, y_data, cv=5, n_estimators=100):
+    """
+    使用 KFold 分割数据，利用 cuML 随机森林评估当前特征子集的CV均方误差。
+    允许设置随机森林的树数量 n_estimators。
+    """
+    kf = KFold(n_splits=cv, shuffle=True, random_state=0)
+    mse_scores = []
+    X_np = cp.asnumpy(X_data)  # 使用CPU索引进行分割
+    for train_index, test_index in kf.split(X_np):
+        X_train = X_data[train_index, :]
+        X_test = X_data[test_index, :]
+        y_train = y_data[train_index]
+        y_test = y_data[test_index]
+        model = cuRF(n_estimators=n_estimators, random_state=0)
+        model.fit(X_train, y_train)
+        mse = model_mse(model, X_test, y_test)
+        mse_scores.append(mse)
+    return np.mean(mse_scores)
+
+# ---------------------------
+# 动态评估随机森林树数量（扩展候选数直到CV MSE收敛）
+# ---------------------------
+def evaluate_n_estimators_dynamic(X_data, y_data, cv=5, start=50, step=50, threshold=1e-3, max_estimators=1000):
+    """
+    从 start 开始，每次增加 step，直到连续两次 CV MSE 的差异小于 threshold，
+    或达到 max_estimators。返回字典 {树数量: CV MSE}。
+    """
+    results = {}
+    candidate = start
+    prev_mse = None
+    while candidate <= max_estimators:
+        mse = cross_val_score_gpu(X_data, y_data, cv=cv, n_estimators=candidate)
+        results[candidate] = mse
+        print(f"n_estimators: {candidate}, CV MSE: {mse:.4f}")
+        if prev_mse is not None and abs(prev_mse - mse) < threshold:
+            break
+        prev_mse = mse
+        candidate += step
+    return results
+
+# 调用动态评估函数
+tree_results = evaluate_n_estimators_dynamic(X_gpu, y_gpu, cv=5, start=50, step=50, threshold=1e-3, max_estimators=1000)
+
+# 绘制树数量与 CV MSE 关系图并保存
+plt.figure(figsize=(8, 5))
+plt.plot(list(tree_results.keys()), list(tree_results.values()), marker='o')
+plt.xlabel('Random Forest 树数量 (n_estimators)')
+plt.ylabel('CV MSE')
+plt.title('不同树数量下的交叉验证MSE')
+plt.grid(True)
+plt.savefig("tree_vs_cv_mse.png", dpi=300)
+plt.close()
+
+# 确定最佳的树数量（CV MSE最小对应的树数）
+best_n_estimators = min(tree_results, key=tree_results.get)
+print(f"最佳随机森林树数量确定为: {best_n_estimators}")
+
+# ---------------------------
+# 手动实现 RFECV，并记录置换重要性变化过程
+# 这里传入最佳树数量 best_n_estimators 以用于模型训练
+# ---------------------------
+def manual_rfecv(X_data, y_data, cv=5, n_estimators=100):
+    """
+    手动实现 RFECV:
+      - 在当前特征子集上计算CV MSE
+      - 使用置换重要性评估各特征的重要性
+      - 移除置换重要性最低的特征，并记录被移除特征的置换重要性
+      - 记录使CV MSE最小的特征组合
+    返回最佳特征组合、最佳CV MSE、历史记录及置换重要性历史。
+    """
+    n_features = X_data.shape[1]
+    features = list(range(n_features))
+    best_score = float('inf')
+    best_features = features.copy()
+    scores_history = []
+    perm_imp_history = []  # 记录每次剔除的最小置换重要性
+    
+    while len(features) > 0:
+        current_score = cross_val_score_gpu(X_data[:, features], y_data, cv=cv, n_estimators=n_estimators)
+        scores_history.append((features.copy(), current_score))
+        print(f"当前特征数: {len(features)}, CV MSE: {current_score:.4f}")
+        if current_score < best_score:
+            best_score = current_score
+            best_features = features.copy()
+        if len(features) == 1:
+            break
+        # 训练模型并计算置换重要性
+        model = cuRF(n_estimators=n_estimators, random_state=0)
+        model.fit(X_data[:, features], y_data)
+        imp = permutation_importance_gpu(model, X_data[:, features], y_data, n_repeats=3, random_state=0)
+        idx_to_remove = int(np.argmin(imp))
+        removed_feature = features[idx_to_remove]
+        removed_imp = imp[idx_to_remove]
+        perm_imp_history.append(removed_imp)
+        print(f"剔除特征索引: {removed_feature}，置换重要性: {removed_imp:.4f}")
+        del features[idx_to_remove]
+    
+    return best_features, best_score, scores_history, perm_imp_history
+
+# 执行手动 RFECV，使用5折交叉验证和最佳树数
+cv_folds = 5
+best_features_rfecv, best_mse_rfecv, history, perm_imp_history = manual_rfecv(X_gpu, y_gpu, cv=cv_folds, n_estimators=best_n_estimators)
+
+print(f"\n手动RFECV选择了 {len(best_features_rfecv)} 个特征，CV MSE: {best_mse_rfecv:.4f}")
+selected_feature_names = [X_df.columns[i] for i in best_features_rfecv]
+print("选定特征名称：", selected_feature_names)
+
+# 绘制置换重要性变化图并保存
+plt.figure(figsize=(8, 5))
+iterations = list(range(1, len(perm_imp_history) + 1))
+plt.plot(iterations, perm_imp_history, marker='o')
+plt.xlabel('RFECV 迭代次数')
+plt.ylabel('被剔除特征的置换重要性')
+plt.title('RFECV过程中置换重要性变化')
+plt.grid(True)
+plt.savefig("rfecv_perm_importance.png", dpi=300)
+plt.close()
+
+# ---------------------------
+# 最终模型训练：使用最佳特征和最佳随机森林树数量训练最终模型
+# ---------------------------
+final_model = cuRF(n_estimators=best_n_estimators, random_state=0)
+final_model.fit(X_gpu[:, best_features_rfecv], y_gpu)
+final_cv_mse = cross_val_score_gpu(X_gpu[:, best_features_rfecv], y_gpu, cv=cv_folds, n_estimators=best_n_estimators)
+print(f"\n最终模型（最佳特征、n_estimators={best_n_estimators}）CV MSE: {final_cv_mse:.4f}")
diff --git a/1d-qsar/cuda/RFE_cuml_permutation_update.log b/1d-qsar/cuda/RFE_cuml_permutation_update.log
new file mode 100644
index 0000000..0636052
--- /dev/null
+++ b/1d-qsar/cuda/RFE_cuml_permutation_update.log
@@ -0,0 +1,38 @@
+nohup: ignoring input
+Number of training samples: 81, Number of features: 156
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:363: UserWarning: For reproducible results in Random Forest Classifier or for almost reproducible results in Random Forest Regressor, n_streams=1 is recommended. If n_streams is > 1, results may vary due to stream/thread timing differences, even when random_state is set
+  return init_func(self, *args, **kwargs)
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188: UserWarning: The number of bins, `n_bins` is greater than the number of samples used for training. Changing `n_bins` to number of training samples.
+  ret = func(*args, **kwargs)
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188: UserWarning: To use pickling first train using float32 data to fit the estimator
+  ret = func(*args, **kwargs)
+n_estimators: 50, CV MSE: 5.2602
+n_estimators: 100, CV MSE: 5.0974
+n_estimators: 150, CV MSE: 5.1422
+n_estimators: 200, CV MSE: 5.0988
+n_estimators: 250, CV MSE: 4.9330
+n_estimators: 300, CV MSE: 4.9259
+n_estimators: 350, CV MSE: 4.9253
+Optimal number of trees determined: 350
+Current number of features: 156, CV MSE: 4.9253
+Removed feature index: 133, feature name: fr_allylic_oxid, permutation importance: -0.0004
+Current number of features: 155, CV MSE: 4.9970
+Removed feature index: 124, feature name: fr_Ar_N, permutation importance: -0.0009
+Current number of features: 154, CV MSE: 4.7611
+Removed feature index: 152, feature name: fr_pyridine, permutation importance: -0.0006
+Current number of features: 153, CV MSE: 4.7928
+Removed feature index: 47, feature name: PEOE_VSA10, permutation importance: -0.0001
+Current number of features: 152, CV MSE: 4.7032
+Removed feature index: 107, feature name: NumAromaticHeterocycles, permutation importance: -0.0010
+Current number of features: 151, CV MSE: 4.7571
+Removed feature index: 155, feature name: fr_unbrch_alkane, permutation importance: -0.0000
+Current number of features: 150, CV MSE: 4.7501
+Removed feature index: 11, feature name: MinPartialCharge, permutation importance: -0.0004
+Current number of features: 149, CV MSE: 4.8415
+Removed feature index: 52, feature name: PEOE_VSA2, permutation importance: -0.0001
+Current number of features: 148, CV MSE: 4.7807
+Removed feature index: 143, feature name: fr_ketone, permutation importance: -0.0001
+Current number of features: 147, CV MSE: 4.7937
+Removed feature index: 12, feature name: MaxAbsPartialCharge, permutation importance: -0.0008
+Current number of features: 146, CV MSE: 4.7589
+Removed feature index: 2, feature name: MinAbsEStateIndex, permutation importance: -0.0007
diff --git a/1d-qsar/cuda/RFE_cuml_permutation_update.py b/1d-qsar/cuda/RFE_cuml_permutation_update.py
new file mode 100644
index 0000000..29c6f0b
--- /dev/null
+++ b/1d-qsar/cuda/RFE_cuml_permutation_update.py
@@ -0,0 +1,234 @@
+#!/usr/bin/env python
+# -*- encoding: utf-8 -*-
+'''
+@file: RFE_cuml_permutation.py
+@Description: Use GPU cuML to perform feature selection via a manual RFECV procedure,
+             removing the least important features based on permutation importance,
+             dynamically determining the optimal number of trees (until CV MSE converges),
+             saving the plots, and finally training the final model using the best n_estimators.
+@Date: 2025/03/01
+@Author: lyzeng (modified)
+nohup python3 -u RFE_cuml_permutation_update.py > RFE_cuml_permutation_update.log 2>&1 &
+'''
+
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+
+# ---------------------------
+# Data loading and feature construction
+# ---------------------------
+data = pd.read_csv("../data_smi.csv", index_col="Entry ID")
+target_data = data[["SMILES", "S.aureus ATCC25923"]]
+target_data.columns = ["SMILES", "TARGET"]
+
+from rdkit import Chem
+from rdkit.Chem import Descriptors
+
+def getMolDescriptors(smiles: str, missingVal=None):
+    mol = Chem.MolFromSmiles(smiles)
+    res = {}
+    for nm, fn in Descriptors._descList:
+        try:
+            val = fn(mol)
+        except Exception as e:
+            val = missingVal
+        res[nm] = val
+    return res
+
+# Construct feature matrix and target variable
+X_df = pd.DataFrame([getMolDescriptors(smi) for smi in target_data['SMILES']])
+y = target_data['TARGET'].values
+
+# Remove invalid features (columns with only one unique value)
+invalid_features = [col for col in X_df.columns if X_df[col].nunique() <= 1]
+X_df.drop(columns=invalid_features, inplace=True)
+X = X_df.values
+
+print(f"Number of training samples: {len(y)}, Number of features: {X.shape[1]}")
+
+# ---------------------------
+# Convert data to GPU arrays (cupy arrays)
+# ---------------------------
+import cupy as cp
+X_gpu = cp.asarray(X)
+y_gpu = cp.asarray(y)
+
+# ---------------------------
+# Helper function: Calculate model MSE
+# ---------------------------
+def model_mse(model, X_data, y_data):
+    y_pred = model.predict(X_data)
+    mse = cp.mean((y_data - y_pred) ** 2)
+    return mse.get()
+
+# ---------------------------
+# Permutation importance function (GPU version)
+# ---------------------------
+def permutation_importance_gpu(model, X_data, y_data, n_repeats=3, random_state=0):
+    """
+    For the given model, compute permutation importance for each feature:
+      Importance = (Mean MSE after permuting the feature) - (Original MSE)
+    The function randomly shuffles each feature and observes the change in model performance.
+    """
+    X_cpu = cp.asnumpy(X_data)  # Copy data to CPU for shuffling
+    baseline = model_mse(model, X_data, y_data)
+    importances = np.zeros(X_cpu.shape[1])
+    rng = np.random.RandomState(random_state)
+    for j in range(X_cpu.shape[1]):
+        scores = []
+        for _ in range(n_repeats):
+            X_permuted = X_cpu.copy()
+            rng.shuffle(X_permuted[:, j])
+            X_permuted_gpu = cp.asarray(X_permuted)
+            score = model_mse(model, X_permuted_gpu, y_data)
+            scores.append(score)
+        importances[j] = np.mean(scores) - baseline
+    return importances
+
+# ---------------------------
+# Modified cross-validation function: allow setting number of trees
+# ---------------------------
+from cuml.ensemble import RandomForestRegressor as cuRF
+from sklearn.model_selection import KFold
+
+def cross_val_score_gpu(X_data, y_data, cv=5, n_estimators=100):
+    """
+    Use KFold to split the data and evaluate the CV MSE of the current feature subset
+    using cuML RandomForestRegressor with a specified number of trees.
+    """
+    kf = KFold(n_splits=cv, shuffle=True, random_state=0)
+    mse_scores = []
+    X_np = cp.asnumpy(X_data)  # Use CPU indices for splitting
+    for train_index, test_index in kf.split(X_np):
+        X_train = X_data[train_index, :]
+        X_test = X_data[test_index, :]
+        y_train = y_data[train_index]
+        y_test = y_data[test_index]
+        model = cuRF(n_estimators=n_estimators, random_state=0)
+        model.fit(X_train, y_train)
+        mse = model_mse(model, X_test, y_test)
+        mse_scores.append(mse)
+    return np.mean(mse_scores)
+
+# ---------------------------
+# Dynamically evaluate the number of trees (expand candidate numbers until CV MSE converges)
+# ---------------------------
+def evaluate_n_estimators_dynamic(X_data, y_data, cv=5, start=50, step=50, threshold=1e-3, max_estimators=1000):
+    """
+    Starting from 'start', increase by 'step' until the difference in consecutive CV MSE is below 'threshold'
+    or until max_estimators is reached. Returns a dict {n_estimators: CV MSE}.
+    """
+    results = {}
+    candidate = start
+    prev_mse = None
+    while candidate <= max_estimators:
+        mse = cross_val_score_gpu(X_data, y_data, cv=cv, n_estimators=candidate)
+        results[candidate] = mse
+        print(f"n_estimators: {candidate}, CV MSE: {mse:.4f}")
+        if prev_mse is not None and abs(prev_mse - mse) < threshold:
+            break
+        prev_mse = mse
+        candidate += step
+    return results
+
+# Evaluate the optimal number of trees
+tree_results = evaluate_n_estimators_dynamic(X_gpu, y_gpu, cv=5, start=50, step=50, threshold=1e-3, max_estimators=1000)
+
+# Plot CV MSE vs Number of Trees and save the figure
+plt.figure(figsize=(8, 5))
+plt.plot(list(tree_results.keys()), list(tree_results.values()), marker='o')
+plt.xlabel('Random Forest n_estimators')
+plt.ylabel('CV MSE')
+plt.title('CV MSE vs Number of Trees')
+plt.grid(True)
+plt.savefig("tree_vs_cv_mse.png", dpi=300)
+plt.close()
+
+# Determine the optimal number of trees (the one with the lowest CV MSE)
+best_n_estimators = min(tree_results, key=tree_results.get)
+print(f"Optimal number of trees determined: {best_n_estimators}")
+
+# ---------------------------
+# Manual RFECV implementation with permutation importance
+# Now includes printing the feature name (from rdkit Descriptors) along with the index.
+# Also records removed feature names for plotting annotation.
+# ---------------------------
+def manual_rfecv(X_data, y_data, feature_names, cv=5, n_estimators=100):
+    """
+    Manual RFECV:
+      - Compute CV MSE on the current feature subset.
+      - Compute permutation importance for each feature.
+      - Remove the feature with the lowest permutation importance and record it.
+      - Record the feature subset that yields the lowest CV MSE.
+    Returns the best feature subset, best CV MSE, history, permutation importance history,
+    and the list of removed feature names (in the order of removal).
+    """
+    n_features = X_data.shape[1]
+    features = list(range(n_features))
+    best_score = float('inf')
+    best_features = features.copy()
+    scores_history = []
+    perm_imp_history = []  # Record the permutation importance of each removed feature
+    removed_feature_names = []  # Record the corresponding feature names
+
+    while len(features) > 0:
+        current_score = cross_val_score_gpu(X_data[:, features], y_data, cv=cv, n_estimators=n_estimators)
+        scores_history.append((features.copy(), current_score))
+        print(f"Current number of features: {len(features)}, CV MSE: {current_score:.4f}")
+        if current_score < best_score:
+            best_score = current_score
+            best_features = features.copy()
+        if len(features) == 1:
+            break
+        # Train the model and compute permutation importance
+        model = cuRF(n_estimators=n_estimators, random_state=0)
+        model.fit(X_data[:, features], y_data)
+        imp = permutation_importance_gpu(model, X_data[:, features], y_data, n_repeats=3, random_state=0)
+        idx_to_remove = int(np.argmin(imp))
+        removed_feature = features[idx_to_remove]
+        removed_imp = imp[idx_to_remove]
+        # Get the feature name using the original feature_names list
+        removed_feature_name = feature_names[removed_feature]
+        perm_imp_history.append(removed_imp)
+        removed_feature_names.append(removed_feature_name)
+        print(f"Removed feature index: {removed_feature}, feature name: {removed_feature_name}, permutation importance: {removed_imp:.4f}")
+        del features[idx_to_remove]
+
+    return best_features, best_score, scores_history, perm_imp_history, removed_feature_names
+
+# Execute manual RFECV using 5-fold CV and the optimal number of trees, also pass the feature names
+cv_folds = 5
+best_features_rfecv, best_mse_rfecv, history, perm_imp_history, removed_feature_names = manual_rfecv(
+    X_gpu, y_gpu, feature_names=list(X_df.columns), cv=cv_folds, n_estimators=best_n_estimators
+)
+
+print(f"\nManual RFECV selected {len(best_features_rfecv)} features, CV MSE: {best_mse_rfecv:.4f}")
+selected_feature_names = [X_df.columns[i] for i in best_features_rfecv]
+print("Selected feature names:", selected_feature_names)
+
+# Plot permutation importance change during RFECV with English labels
+plt.figure(figsize=(8, 5))
+iterations = list(range(1, len(perm_imp_history) + 1))
+plt.plot(iterations, perm_imp_history, marker='o')
+plt.xlabel('RFECV Iteration')
+plt.ylabel('Permutation Importance of Removed Feature')
+plt.title('Permutation Importance during RFECV')
+plt.grid(True)
+
+# Annotation threshold: annotate points where absolute permutation importance > 0.002
+annotation_threshold = 0.002
+for i, imp_val in enumerate(perm_imp_history):
+    if abs(imp_val) > annotation_threshold:
+        plt.annotate(removed_feature_names[i], (iterations[i], imp_val), textcoords="offset points", xytext=(0,5), ha='center')
+
+plt.savefig("rfecv_perm_importance.png", dpi=300)
+plt.close()
+
+# ---------------------------
+# Final model training: train final model using best feature subset and optimal number of trees
+# ---------------------------
+final_model = cuRF(n_estimators=best_n_estimators, random_state=0)
+final_model.fit(X_gpu[:, best_features_rfecv], y_gpu)
+final_cv_mse = cross_val_score_gpu(X_gpu[:, best_features_rfecv], y_gpu, cv=cv_folds, n_estimators=best_n_estimators)
+print(f"\nFinal model (selected features, n_estimators={best_n_estimators}) CV MSE: {final_cv_mse:.4f}")
diff --git a/1d-qsar/cuda/feature.log b/1d-qsar/cuda/feature.log
new file mode 100644
index 0000000..4d06789
--- /dev/null
+++ b/1d-qsar/cuda/feature.log
@@ -0,0 +1,322 @@
+(rapids-25.02) root@DESK4090:~/project/qsar/1d-qsar/cuda# python qssar1d.py 
+训练样本数: 81, 特征数量: 156
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:363: UserWarning: For reproducible results in Random Forest Classifier or for almost reproducible results in Random Forest Regressor, n_streams=1 is recommended. If n_streams is > 1, results may vary due to stream/thread timing differences, even when random_state is set
+  return init_func(self, *args, **kwargs)
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188: UserWarning: The number of bins, `n_bins` is greater than the number of samples used for training. Changing `n_bins` to number of training samples.
+  ret = func(*args, **kwargs)
+/root/micromamba/envs/rapids-25.02/lib/python3.12/site-packages/cuml/internals/api_decorators.py:188: UserWarning: To use pickling first train using float32 data to fit the estimator
+  ret = func(*args, **kwargs)
+当前特征数: 156, CV MSE: 5.0974
+剔除特征索引: 13，置换重要性: -0.0022
+当前特征数: 155, CV MSE: 5.2003
+剔除特征索引: 133，置换重要性: -0.0016
+当前特征数: 154, CV MSE: 5.1318
+剔除特征索引: 107，置换重要性: -0.0188
+当前特征数: 153, CV MSE: 5.1432
+剔除特征索引: 77，置换重要性: -0.0082
+当前特征数: 152, CV MSE: 5.1745
+剔除特征索引: 126，置换重要性: -0.0005
+当前特征数: 151, CV MSE: 4.9942
+剔除特征索引: 106，置换重要性: -0.0072
+当前特征数: 150, CV MSE: 5.0801
+剔除特征索引: 108，置换重要性: -0.0077
+当前特征数: 149, CV MSE: 4.9464
+剔除特征索引: 101，置换重要性: -0.0008
+当前特征数: 148, CV MSE: 4.7651
+剔除特征索引: 67，置换重要性: -0.0028
+当前特征数: 147, CV MSE: 4.9519
+剔除特征索引: 62，置换重要性: -0.0005
+当前特征数: 146, CV MSE: 4.8742
+剔除特征索引: 86，置换重要性: -0.0032
+当前特征数: 145, CV MSE: 4.7400
+剔除特征索引: 119，置换重要性: -0.0016
+当前特征数: 144, CV MSE: 4.7471
+剔除特征索引: 148，置换重要性: -0.0018
+当前特征数: 143, CV MSE: 4.8387
+剔除特征索引: 138，置换重要性: -0.0006
+当前特征数: 142, CV MSE: 5.0176
+剔除特征索引: 143，置换重要性: -0.0008
+当前特征数: 141, CV MSE: 4.8092
+剔除特征索引: 124，置换重要性: -0.0070
+当前特征数: 140, CV MSE: 4.8574
+剔除特征索引: 111，置换重要性: -0.0046
+当前特征数: 139, CV MSE: 5.0330
+剔除特征索引: 128，置换重要性: -0.0012
+当前特征数: 138, CV MSE: 5.0798
+剔除特征索引: 139，置换重要性: -0.0009
+当前特征数: 137, CV MSE: 4.9377
+剔除特征索引: 52，置换重要性: -0.0004
+当前特征数: 136, CV MSE: 5.1192
+剔除特征索引: 155，置换重要性: -0.0000
+当前特征数: 135, CV MSE: 5.2678
+剔除特征索引: 152，置换重要性: -0.0040
+当前特征数: 134, CV MSE: 4.8579
+剔除特征索引: 93，置换重要性: -0.0031
+当前特征数: 133, CV MSE: 4.8737
+剔除特征索引: 125，置换重要性: -0.0000
+当前特征数: 132, CV MSE: 5.1150
+剔除特征索引: 84，置换重要性: -0.0009
+当前特征数: 131, CV MSE: 5.0172
+剔除特征索引: 10，置换重要性: -0.0006
+当前特征数: 130, CV MSE: 5.0235
+剔除特征索引: 21，置换重要性: -0.0018
+当前特征数: 129, CV MSE: 4.7640
+剔除特征索引: 76，置换重要性: -0.0007
+当前特征数: 128, CV MSE: 4.8049
+剔除特征索引: 74，置换重要性: -0.0037
+当前特征数: 127, CV MSE: 4.7546
+剔除特征索引: 55，置换重要性: -0.0003
+当前特征数: 126, CV MSE: 4.7790
+剔除特征索引: 66，置换重要性: -0.0018
+当前特征数: 125, CV MSE: 4.9159
+剔除特征索引: 80，置换重要性: -0.0003
+当前特征数: 124, CV MSE: 4.8962
+剔除特征索引: 115，置换重要性: -0.0012
+当前特征数: 123, CV MSE: 4.7899
+剔除特征索引: 53，置换重要性: -0.0000
+当前特征数: 122, CV MSE: 4.9718
+剔除特征索引: 27，置换重要性: -0.0007
+当前特征数: 121, CV MSE: 4.7772
+剔除特征索引: 12，置换重要性: -0.0009
+当前特征数: 120, CV MSE: 4.9935
+剔除特征索引: 49，置换重要性: -0.0002
+当前特征数: 119, CV MSE: 4.7858
+剔除特征索引: 47，置换重要性: -0.0002
+当前特征数: 118, CV MSE: 4.8866
+剔除特征索引: 113，置换重要性: -0.0019
+当前特征数: 117, CV MSE: 4.6379
+剔除特征索引: 63，置换重要性: -0.0002
+当前特征数: 116, CV MSE: 4.7249
+剔除特征索引: 65，置换重要性: -0.0002
+当前特征数: 115, CV MSE: 4.9689
+剔除特征索引: 88，置换重要性: -0.0005
+当前特征数: 114, CV MSE: 4.8946
+剔除特征索引: 48，置换重要性: -0.0004
+当前特征数: 113, CV MSE: 4.7893
+剔除特征索引: 11，置换重要性: -0.0007
+当前特征数: 112, CV MSE: 4.7882
+剔除特征索引: 15，置换重要性: -0.0035
+当前特征数: 111, CV MSE: 4.6776
+剔除特征索引: 24，置换重要性: -0.0027
+当前特征数: 110, CV MSE: 4.8426
+剔除特征索引: 51，置换重要性: -0.0000
+当前特征数: 109, CV MSE: 4.9341
+剔除特征索引: 127，置换重要性: -0.0014
+当前特征数: 108, CV MSE: 4.7765
+剔除特征索引: 31，置换重要性: -0.0006
+当前特征数: 107, CV MSE: 4.9951
+剔除特征索引: 36，置换重要性: -0.0010
+当前特征数: 106, CV MSE: 5.2174
+剔除特征索引: 32，置换重要性: -0.0009
+当前特征数: 105, CV MSE: 4.7904
+剔除特征索引: 140，置换重要性: -0.0012
+当前特征数: 104, CV MSE: 5.0252
+剔除特征索引: 9，置换重要性: -0.0004
+当前特征数: 103, CV MSE: 4.9320
+剔除特征索引: 70，置换重要性: -0.0001
+当前特征数: 102, CV MSE: 5.2598
+剔除特征索引: 25，置换重要性: -0.0008
+当前特征数: 101, CV MSE: 4.9394
+剔除特征索引: 54，置换重要性: -0.0001
+当前特征数: 100, CV MSE: 4.9273
+剔除特征索引: 16，置换重要性: -0.0007
+当前特征数: 99, CV MSE: 4.8924
+剔除特征索引: 42，置换重要性: -0.0020
+当前特征数: 98, CV MSE: 5.0755
+剔除特征索引: 150，置换重要性: -0.0000
+当前特征数: 97, CV MSE: 4.8551
+剔除特征索引: 116，置换重要性: -0.0013
+当前特征数: 96, CV MSE: 4.9251
+剔除特征索引: 153，置换重要性: -0.0000
+当前特征数: 95, CV MSE: 4.9058
+剔除特征索引: 0，置换重要性: -0.0001
+当前特征数: 94, CV MSE: 4.8297
+剔除特征索引: 144，置换重要性: -0.0016
+当前特征数: 93, CV MSE: 4.7873
+剔除特征索引: 19，置换重要性: -0.0041
+当前特征数: 92, CV MSE: 4.9690
+剔除特征索引: 105，置换重要性: -0.0003
+当前特征数: 91, CV MSE: 4.8730
+剔除特征索引: 136，置换重要性: -0.0001
+当前特征数: 90, CV MSE: 4.7316
+剔除特征索引: 154，置换重要性: -0.0000
+当前特征数: 89, CV MSE: 4.6072
+剔除特征索引: 145，置换重要性: -0.0000
+当前特征数: 88, CV MSE: 4.4552
+剔除特征索引: 29，置换重要性: -0.0018
+当前特征数: 87, CV MSE: 4.7021
+剔除特征索引: 149，置换重要性: -0.0000
+当前特征数: 86, CV MSE: 4.4475
+剔除特征索引: 129，置换重要性: -0.0000
+当前特征数: 85, CV MSE: 4.6533
+剔除特征索引: 1，置换重要性: 0.0000
+当前特征数: 84, CV MSE: 4.3199
+剔除特征索引: 135，置换重要性: -0.0003
+当前特征数: 83, CV MSE: 4.2181
+剔除特征索引: 142，置换重要性: -0.0000
+当前特征数: 82, CV MSE: 4.4828
+剔除特征索引: 69，置换重要性: -0.0001
+当前特征数: 81, CV MSE: 4.4320
+剔除特征索引: 137，置换重要性: -0.0019
+当前特征数: 80, CV MSE: 4.5423
+剔除特征索引: 2，置换重要性: 0.0000
+当前特征数: 79, CV MSE: 4.5011
+剔除特征索引: 3，置换重要性: 0.0000
+当前特征数: 78, CV MSE: 4.4884
+剔除特征索引: 64，置换重要性: -0.0018
+当前特征数: 77, CV MSE: 4.6200
+剔除特征索引: 4，置换重要性: 0.0000
+当前特征数: 76, CV MSE: 4.5752
+剔除特征索引: 5，置换重要性: 0.0000
+当前特征数: 75, CV MSE: 4.4515
+剔除特征索引: 41，置换重要性: -0.0010
+当前特征数: 74, CV MSE: 4.6762
+剔除特征索引: 6，置换重要性: 0.0000
+当前特征数: 73, CV MSE: 4.3106
+剔除特征索引: 7，置换重要性: 0.0000
+当前特征数: 72, CV MSE: 4.1087
+剔除特征索引: 147，置换重要性: -0.0000
+当前特征数: 71, CV MSE: 4.4017
+剔除特征索引: 91，置换重要性: -0.0030
+当前特征数: 70, CV MSE: 4.3070
+剔除特征索引: 146，置换重要性: -0.0000
+当前特征数: 69, CV MSE: 4.5710
+剔除特征索引: 8，置换重要性: 0.0000
+当前特征数: 68, CV MSE: 4.3590
+剔除特征索引: 122，置换重要性: -0.0013
+当前特征数: 67, CV MSE: 4.3564
+剔除特征索引: 14，置换重要性: 0.0000
+当前特征数: 66, CV MSE: 4.5206
+剔除特征索引: 134，置换重要性: -0.0004
+当前特征数: 65, CV MSE: 4.6562
+剔除特征索引: 60，置换重要性: -0.0038
+当前特征数: 64, CV MSE: 4.4382
+剔除特征索引: 17，置换重要性: 0.0000
+当前特征数: 63, CV MSE: 4.5168
+剔除特征索引: 18，置换重要性: -0.0001
+当前特征数: 62, CV MSE: 4.6560
+剔除特征索引: 20，置换重要性: 0.0000
+当前特征数: 61, CV MSE: 4.6232
+剔除特征索引: 22，置换重要性: 0.0000
+当前特征数: 60, CV MSE: 4.5546
+剔除特征索引: 23，置换重要性: 0.0000
+当前特征数: 59, CV MSE: 4.6368
+剔除特征索引: 26，置换重要性: -0.0000
+当前特征数: 58, CV MSE: 4.7763
+剔除特征索引: 28，置换重要性: 0.0000
+当前特征数: 57, CV MSE: 4.5347
+剔除特征索引: 30，置换重要性: -0.0001
+当前特征数: 56, CV MSE: 4.5139
+剔除特征索引: 33，置换重要性: 0.0000
+当前特征数: 55, CV MSE: 4.4100
+剔除特征索引: 34，置换重要性: 0.0000
+当前特征数: 54, CV MSE: 4.3154
+剔除特征索引: 35，置换重要性: 0.0000
+当前特征数: 53, CV MSE: 4.4305
+剔除特征索引: 37，置换重要性: 0.0000
+当前特征数: 52, CV MSE: 4.5358
+剔除特征索引: 50，置换重要性: 0.0000
+当前特征数: 51, CV MSE: 4.5094
+剔除特征索引: 38，置换重要性: 0.0022
+当前特征数: 50, CV MSE: 4.5153
+剔除特征索引: 39，置换重要性: -0.0004
+当前特征数: 49, CV MSE: 4.3441
+剔除特征索引: 40，置换重要性: -0.0001
+当前特征数: 48, CV MSE: 4.4674
+剔除特征索引: 43，置换重要性: 0.0000
+当前特征数: 47, CV MSE: 4.3391
+剔除特征索引: 131，置换重要性: -0.0003
+当前特征数: 46, CV MSE: 4.6788
+剔除特征索引: 44，置换重要性: 0.0000
+当前特征数: 45, CV MSE: 4.2720
+剔除特征索引: 45，置换重要性: 0.0000
+当前特征数: 44, CV MSE: 4.3232
+剔除特征索引: 46，置换重要性: 0.0000
+当前特征数: 43, CV MSE: 4.1793
+剔除特征索引: 56，置换重要性: 0.0000
+当前特征数: 42, CV MSE: 4.4771
+剔除特征索引: 57，置换重要性: 0.0000
+当前特征数: 41, CV MSE: 4.1601
+剔除特征索引: 58，置换重要性: 0.0000
+当前特征数: 40, CV MSE: 4.3411
+剔除特征索引: 59，置换重要性: 0.0000
+当前特征数: 39, CV MSE: 4.3668
+剔除特征索引: 61，置换重要性: -0.0000
+当前特征数: 38, CV MSE: 4.3295
+剔除特征索引: 68，置换重要性: 0.0000
+当前特征数: 37, CV MSE: 4.4400
+剔除特征索引: 71，置换重要性: -0.0000
+当前特征数: 36, CV MSE: 4.1841
+剔除特征索引: 72，置换重要性: 0.0000
+当前特征数: 35, CV MSE: 4.2940
+剔除特征索引: 73，置换重要性: 0.0000
+当前特征数: 34, CV MSE: 4.1931
+剔除特征索引: 75，置换重要性: 0.0000
+当前特征数: 33, CV MSE: 4.0917
+剔除特征索引: 78，置换重要性: 0.0000
+当前特征数: 32, CV MSE: 4.5247
+剔除特征索引: 79，置换重要性: -0.0000
+当前特征数: 31, CV MSE: 4.5843
+剔除特征索引: 81，置换重要性: 0.0000
+当前特征数: 30, CV MSE: 4.4505
+剔除特征索引: 82，置换重要性: 0.0000
+当前特征数: 29, CV MSE: 4.3227
+剔除特征索引: 83，置换重要性: 0.0000
+当前特征数: 28, CV MSE: 4.1382
+剔除特征索引: 85，置换重要性: 0.0000
+当前特征数: 27, CV MSE: 4.4850
+剔除特征索引: 87，置换重要性: 0.0000
+当前特征数: 26, CV MSE: 4.3147
+剔除特征索引: 89，置换重要性: 0.0001
+当前特征数: 25, CV MSE: 4.3994
+剔除特征索引: 90，置换重要性: 0.0000
+当前特征数: 24, CV MSE: 4.4345
+剔除特征索引: 92，置换重要性: 0.0000
+当前特征数: 23, CV MSE: 4.4428
+剔除特征索引: 94，置换重要性: 0.0000
+当前特征数: 22, CV MSE: 4.8050
+剔除特征索引: 95，置换重要性: 0.0000
+当前特征数: 21, CV MSE: 4.5067
+剔除特征索引: 96，置换重要性: 0.0004
+当前特征数: 20, CV MSE: 4.5489
+剔除特征索引: 97，置换重要性: 0.0001
+当前特征数: 19, CV MSE: 4.6281
+剔除特征索引: 98，置换重要性: 0.0009
+当前特征数: 18, CV MSE: 4.4093
+剔除特征索引: 99，置换重要性: 0.0039
+当前特征数: 17, CV MSE: 4.3291
+剔除特征索引: 100，置换重要性: 0.0007
+当前特征数: 16, CV MSE: 4.3191
+剔除特征索引: 102，置换重要性: 0.0002
+当前特征数: 15, CV MSE: 4.6593
+剔除特征索引: 103，置换重要性: -0.0002
+当前特征数: 14, CV MSE: 4.6185
+剔除特征索引: 104，置换重要性: 0.0000
+当前特征数: 13, CV MSE: 4.4200
+剔除特征索引: 109，置换重要性: 0.0000
+当前特征数: 12, CV MSE: 4.3259
+剔除特征索引: 110，置换重要性: -0.0000
+当前特征数: 11, CV MSE: 4.0543
+剔除特征索引: 112，置换重要性: 0.0002
+当前特征数: 10, CV MSE: 3.9899
+剔除特征索引: 114，置换重要性: 0.0000
+当前特征数: 9, CV MSE: 4.0719
+剔除特征索引: 117，置换重要性: 0.0062
+当前特征数: 8, CV MSE: 4.3854
+剔除特征索引: 118，置换重要性: 0.0000
+当前特征数: 7, CV MSE: 4.2360
+剔除特征索引: 120，置换重要性: 0.0022
+当前特征数: 6, CV MSE: 3.8872
+剔除特征索引: 121，置换重要性: 0.0034
+当前特征数: 5, CV MSE: 3.8296
+剔除特征索引: 123，置换重要性: 0.0022
+当前特征数: 4, CV MSE: 3.8317
+剔除特征索引: 130，置换重要性: 0.0007
+当前特征数: 3, CV MSE: 3.9287
+剔除特征索引: 132，置换重要性: 0.0000
+当前特征数: 2, CV MSE: 3.9652
+剔除特征索引: 141，置换重要性: 0.0000
+当前特征数: 1, CV MSE: 3.9652
+
+手动RFECV选择了 5 个特征，CV MSE: 3.8296
+选定特征名称： ['fr_Al_OH_noTert', 'fr_Ndealkylation2', 'fr_alkyl_halide', 'fr_halogen', 'fr_piperdine']
\ No newline at end of file
diff --git a/1d-qsar/cuda/qssar1d.py b/1d-qsar/cuda/qssar1d.py
new file mode 100644
index 0000000..dc93cfa
--- /dev/null
+++ b/1d-qsar/cuda/qssar1d.py
@@ -0,0 +1,152 @@
+#!/usr/bin/env python
+# -*- encoding: utf-8 -*-
+'''
+@file: RFE_cuml_permutation.py
+@Description: 使用GPU cuML实现特征选择，手动实现RFECV，并通过置换重要性计算来剔除最不重要的特征
+@Date: 2025/03/01
+@Author: lyzeng
+'''
+
+import numpy as np
+import pandas as pd
+
+# ---------------------------
+# 数据加载与特征构造
+# ---------------------------
+data = pd.read_csv("../data_smi.csv", index_col="Entry ID")
+target_data = data[["SMILES", "S.aureus ATCC25923"]]
+target_data.columns = ["SMILES", "TARGET"]
+
+from rdkit import Chem
+from rdkit.Chem import Descriptors
+
+def getMolDescriptors(smiles: str, missingVal=None):
+    mol = Chem.MolFromSmiles(smiles)
+    res = {}
+    for nm, fn in Descriptors._descList:
+        try:
+            val = fn(mol)
+        except Exception as e:
+            val = missingVal
+        res[nm] = val
+    return res
+
+# 构建特征矩阵与目标变量
+X_df = pd.DataFrame([getMolDescriptors(smi) for smi in target_data['SMILES']])
+y = target_data['TARGET'].values
+
+# 剔除无效特征（取值唯一的列）
+invalid_features = [col for col in X_df.columns if X_df[col].nunique() <= 1]
+X_df.drop(columns=invalid_features, inplace=True)
+X = X_df.values
+
+print(f"训练样本数: {len(y)}, 特征数量: {X.shape[1]}")
+
+# ---------------------------
+# 转换为GPU数据格式（cupy数组）
+# ---------------------------
+import cupy as cp
+X_gpu = cp.asarray(X)
+y_gpu = cp.asarray(y)
+
+# ---------------------------
+# 定义辅助函数：计算模型的MSE
+# ---------------------------
+def model_mse(model, X_data, y_data):
+    y_pred = model.predict(X_data)
+    mse = cp.mean((y_data - y_pred) ** 2)
+    return mse.get()
+
+# ---------------------------
+# 定义置换重要性计算函数（在GPU上）
+# ---------------------------
+def permutation_importance_gpu(model, X_data, y_data, n_repeats=3, random_state=0):
+    """
+    对于给定的模型，计算每个特征的置换重要性：
+      重要性 = (打乱特征后的MSE均值) - (原始MSE)
+    随机打乱每个特征的值，并观察模型性能的下降程度来衡量特征的重要性。 如果打乱某个特征导致模型性能显著下降，说明该特征对模型很重要；反之，如果打乱某个特征对模型性能没有明显影响，说明该特征不太重要。
+    """
+    # 先将数据复制到CPU进行打乱操作（数据量较小时不会有太大开销）
+    X_cpu = cp.asnumpy(X_data)
+    baseline = model_mse(model, X_data, y_data)
+    importances = np.zeros(X_cpu.shape[1])
+    rng = np.random.RandomState(random_state)
+    for j in range(X_cpu.shape[1]):
+        scores = []
+        for _ in range(n_repeats):
+            X_permuted = X_cpu.copy()
+            rng.shuffle(X_permuted[:, j])
+            X_permuted_gpu = cp.asarray(X_permuted)
+            score = model_mse(model, X_permuted_gpu, y_data)
+            scores.append(score)
+        importances[j] = np.mean(scores) - baseline
+    return importances
+
+# ---------------------------
+# 使用 cuML 随机森林和置换重要性手动实现 RFECV
+# ---------------------------
+from cuml.ensemble import RandomForestRegressor as cuRF
+from sklearn.model_selection import KFold
+
+def cross_val_score_gpu(X_data, y_data, cv=5):
+    """
+    使用 KFold 分割数据，利用 cuML 随机森林评估当前特征子集的CV均方误差
+    """
+    kf = KFold(n_splits=cv, shuffle=True, random_state=0)
+    mse_scores = []
+    # 使用 CPU 索引进行分割
+    X_np = cp.asnumpy(X_data)
+    for train_index, test_index in kf.split(X_np):
+        X_train = X_data[train_index, :]
+        X_test = X_data[test_index, :]
+        y_train = y_data[train_index]
+        y_test = y_data[test_index]
+        model = cuRF(n_estimators=100, random_state=0)
+        model.fit(X_train, y_train)
+        mse = model_mse(model, X_test, y_test)
+        mse_scores.append(mse)
+    return np.mean(mse_scores)
+
+def manual_rfecv(X_data, y_data, cv=5):
+    """
+    手动实现 RFECV:
+      - 在当前特征子集上计算CV MSE
+      - 使用置换重要性评估各特征的重要性
+      - 移除置换重要性最低的特征
+      - 记录使CV MSE最小的特征组合
+    找到一个最佳的特征组合，用于训练最终的模型。 这个子集的大小在每次迭代中都在变化，最终目标是找到一个使交叉验证 MSE 最小的子集
+     cuml.ensemble.RandomForestRegressor 的 max_features 默认值通常是 sqrt (特征总数的平方根)。 这与 scikit-learn 的 RandomForestRegressor 的默认值相同。
+    """
+    n_features = X_data.shape[1]
+    features = list(range(n_features))
+    best_score = float('inf')
+    best_features = features.copy()
+    scores_history = []
+    
+    while len(features) > 0:
+        current_score = cross_val_score_gpu(X_data[:, features], y_data, cv=cv)
+        scores_history.append((features.copy(), current_score))
+        print(f"当前特征数: {len(features)}, CV MSE: {current_score:.4f}")
+        if current_score < best_score:
+            best_score = current_score
+            best_features = features.copy()
+        if len(features) == 1:
+            break
+        # 训练模型并计算置换重要性
+        model = cuRF(n_estimators=100, random_state=0) #  cuml.ensemble.RandomForestRegressor 的 max_features 默认值通常是 sqrt (特征总数的平方根)。 这与 scikit-learn 的 RandomForestRegressor 的默认值相同。
+        model.fit(X_data[:, features], y_data)
+        imp = permutation_importance_gpu(model, X_data[:, features], y_data, n_repeats=3, random_state=0)
+        idx_to_remove = int(np.argmin(imp))
+        removed_feature = features[idx_to_remove]
+        print(f"剔除特征索引: {removed_feature}，置换重要性: {imp[idx_to_remove]:.4f}")
+        del features[idx_to_remove]
+    
+    return best_features, best_score, scores_history
+
+# 执行手动 RFECV，使用5折交叉验证
+cv_folds = 5
+best_features_rfecv, best_mse_rfecv, history = manual_rfecv(X_gpu, y_gpu, cv=cv_folds)
+
+print(f"\n手动RFECV选择了 {len(best_features_rfecv)} 个特征，CV MSE: {best_mse_rfecv:.4f}")
+selected_feature_names = [X_df.columns[i] for i in best_features_rfecv]
+print("选定特征名称：", selected_feature_names)
diff --git a/1d-qsar/cuda/random_forest_demo.ipynb b/1d-qsar/cuda/random_forest_demo.ipynb
new file mode 100644
index 0000000..eb4e6e7
--- /dev/null
+++ b/1d-qsar/cuda/random_forest_demo.ipynb
@@ -0,0 +1,294 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Random Forest and Pickling\n",
+    "The Random Forest algorithm is a classification method which builds several decision trees, and aggregates each of their outputs to make a prediction.\n",
+    "\n",
+    "In this notebook we will train a scikit-learn and a cuML Random Forest Classification model. Then we save the cuML model for future use with Python's `pickling` mechanism and demonstrate how to re-load it for prediction. We also compare the results of the scikit-learn, non-pickled and pickled cuML models.\n",
+    "\n",
+    "Note that the underlying algorithm in cuML for tree node splits differs from that used in scikit-learn.\n",
+    "\n",
+    "For information on converting your dataset to cuDF format, refer to the [cuDF documentation](https://docs.rapids.ai/api/cudf/stable)\n",
+    "\n",
+    "For additional information cuML's random forest model: https://docs.rapids.ai/api/cuml/stable/api.html#random-forest"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import cudf\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import pickle\n",
+    "\n",
+    "from cuml.ensemble import RandomForestClassifier as curfc\n",
+    "from cuml.metrics import accuracy_score\n",
+    "\n",
+    "from sklearn.ensemble import RandomForestClassifier as skrfc\n",
+    "from sklearn.datasets import make_classification\n",
+    "from sklearn.model_selection import train_test_split"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Define Parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The speedup obtained by using cuML'sRandom Forest implementation\n",
+    "# becomes much higher when using larger datasets. Uncomment and use the n_samples\n",
+    "# value provided below to see the difference in the time required to run\n",
+    "# Scikit-learn's vs cuML's implementation with a large dataset.\n",
+    "\n",
+    "# n_samples = 2*17\n",
+    "n_samples = 2**12\n",
+    "n_features = 399\n",
+    "n_info = 300\n",
+    "data_type = np.float32"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generate Data\n",
+    "\n",
+    "### Host"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "X,y = make_classification(n_samples=n_samples,\n",
+    "                          n_features=n_features,\n",
+    "                          n_informative=n_info,\n",
+    "                          random_state=123, n_classes=2)\n",
+    "\n",
+    "X = pd.DataFrame(X.astype(data_type))\n",
+    "# cuML Random Forest Classifier requires the labels to be integers\n",
+    "y = pd.Series(y.astype(np.int32))\n",
+    "\n",
+    "X_train, X_test, y_train, y_test = train_test_split(X, y,\n",
+    "                                                    test_size = 0.2,\n",
+    "                                                    random_state=0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### GPU"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "X_cudf_train = cudf.DataFrame.from_pandas(X_train)\n",
+    "X_cudf_test = cudf.DataFrame.from_pandas(X_test)\n",
+    "\n",
+    "y_cudf_train = cudf.Series(y_train.values)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Scikit-learn Model\n",
+    "\n",
+    "### Fit"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "sk_model = skrfc(n_estimators=40,\n",
+    "                 max_depth=16,\n",
+    "                 max_features=1.0,\n",
+    "                 random_state=10)\n",
+    "\n",
+    "sk_model.fit(X_train, y_train)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Evaluate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "sk_predict = sk_model.predict(X_test)\n",
+    "sk_acc = accuracy_score(y_test, sk_predict)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## cuML Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Fit"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "cuml_model = curfc(n_estimators=40,\n",
+    "                   max_depth=16,\n",
+    "                   max_features=1.0,\n",
+    "                   random_state=10)\n",
+    "\n",
+    "cuml_model.fit(X_cudf_train, y_cudf_train)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Evaluate"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "fil_preds_orig = cuml_model.predict(X_cudf_test)\n",
+    "\n",
+    "fil_acc_orig = accuracy_score(y_test.to_numpy(), fil_preds_orig)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Pickle the cuML random forest classification model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "filename = 'cuml_random_forest_model.sav'\n",
+    "# save the trained cuml model into a file\n",
+    "pickle.dump(cuml_model, open(filename, 'wb'))\n",
+    "# delete the previous model to ensure that there is no leakage of pointers.\n",
+    "# this is not strictly necessary but just included here for demo purposes.\n",
+    "del cuml_model\n",
+    "# load the previously saved cuml model from a file\n",
+    "pickled_cuml_model = pickle.load(open(filename, 'rb'))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Predict using the pickled model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%%time\n",
+    "pred_after_pickling = pickled_cuml_model.predict(X_cudf_test)\n",
+    "\n",
+    "fil_acc_after_pickling = accuracy_score(y_test.to_numpy(), pred_after_pickling)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Compare Results"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"CUML accuracy of the RF model before pickling: %s\" % fil_acc_orig)\n",
+    "print(\"CUML accuracy of the RF model after pickling: %s\" % fil_acc_after_pickling)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(\"SKL accuracy: %s\" % sk_acc)\n",
+    "print(\"CUML accuracy before pickling: %s\" % fil_acc_orig)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}