如何在C++中忽略某些输入行？

提问者：小点点

如何在C++中忽略某些输入行？

好的，这是一个小背景，这个代码应该读取一个包含DNA的文件，计算出核苷酸a，C，T，G的数目，然后打印出来，还做一些其他的轻微计算。我的代码对于大多数文件运行良好，除了文件中包含以@和+开头的行的文件。我需要跳过那些行以便得到一个准确的数字。所以我的问题是，在我的计算中，如何跳过或忽略这些行。我的代码是

#include <iostream>
#include <stream>
#include <string>
#include <vector>
#include <map>

int main(int argc, char** argv) {
// Ignore how the above argc and argv are used here
auto arguments = std::vector<std::string>(argv, argv + argc);
// "arguments" box has what you wrote on the right side after &&

if (arguments.size() != 2) {
    // ensure you wrote a file name after "./a.out"
    std::cout << "Please give a file name as argument\n";
    return 1;
}

auto file = std::fstream(arguments[1]);
if (!file) {
    // ensure the file name you gave is from the available files
    std::cout << "Cannot open " << arguments[1] << "\n";
    return 1;
}
auto counts = std::map<char,int>({{'G',0.0},{'A',0.0},{'C',0.0},{'T',0.0}});


// Just a test loop to print all lines from the file
for (auto dna = std::string(); std::getline(file, dna); ) {
    //std::cout << dna << "\n";
    for (auto nucleotide:dna) {
      counts[nucleotide]=counts[nucleotide] + 1;
    }
}

double total = counts['A'] + counts['T'] + counts['G'] + counts['C'];
double GC = (counts['G'] + counts['C'])*100/total;
double AT = (counts['A'] + counts['T'])*100/total;
double ratio = AT/GC;
auto classification = "";

if ( 40.0 < GC < 60.0) {
   classification = "moderate GC content";
}
if (60 <= GC) {
   classification = "high GC content";
}
if (GC <= 40.0) {
   classification = "low GC content";
}


std::cout << "GC-content: " << GC << "\n";
std::cout << "AT-content: " << AT << "\n";
std::cout << "G count: " << counts['G'] << "\n";
std::cout << "C count: " << counts['C'] << "\n";
std::cout << "A count: " << counts['A'] << "\n";
std::cout << "T count: " << counts['T'] << "\n";
std::cout << "Total count: " << total << "\n";
std::cout << "AT/GC Ratio: " << ratio << "\n";

std::cout << "GC Classification: " << classification << "\n";
}

给我带来麻烦的文件是这样的

@ERR034677.1 HWI-EAS349_0046:7:1:2144:972#0 length=76
NGATGATAAACAAGAGGGTAAAAAGAAAAAAGCTACAGACATTTCTGCTAATCTATTATTTTGTTCCTTTTTTTTT
+ERR034677.1 HWI-EAS349_0046:7:1:2144:972#0 length=76
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

如果有人能帮我这件事。我会非常感激的。我只需要一个提示或一个概念的想法，我错过了，所以我可以使我的代码兼容所有文件。提前致谢

共1个答案

匿名用户

您的实际问题似乎是“输入不总是干净的语法”的标准情况。
解决方案总是“不要期望干净的语法”。
首先将整行读入缓冲区。
然后检查语法。
跳过损坏的语法。
从缓冲区扫描干净的语法。

如何在C++中忽略某些输入行？

共1个答案

相关问题