提问者:小点点

如何在C++中忽略某些输入行?


好的,这是一个小背景,这个代码应该读取一个包含DNA的文件,计算出核苷酸a,C,T,G的数目,然后打印出来,还做一些其他的轻微计算。 我的代码对于大多数文件运行良好,除了文件中包含以@和+开头的行的文件。 我需要跳过那些行以便得到一个准确的数字。 所以我的问题是,在我的计算中,如何跳过或忽略这些行。 我的代码是

#include <iostream>
#include <stream>
#include <string>
#include <vector>
#include <map>

int main(int argc, char** argv) {
// Ignore how the above argc and argv are used here
auto arguments = std::vector<std::string>(argv, argv + argc);
// "arguments" box has what you wrote on the right side after &&

if (arguments.size() != 2) {
    // ensure you wrote a file name after "./a.out"
    std::cout << "Please give a file name as argument\n";
    return 1;
}

auto file = std::fstream(arguments[1]);
if (!file) {
    // ensure the file name you gave is from the available files
    std::cout << "Cannot open " << arguments[1] << "\n";
    return 1;
}
auto counts = std::map<char,int>({{'G',0.0},{'A',0.0},{'C',0.0},{'T',0.0}});


// Just a test loop to print all lines from the file
for (auto dna = std::string(); std::getline(file, dna); ) {
    //std::cout << dna << "\n";
    for (auto nucleotide:dna) {
      counts[nucleotide]=counts[nucleotide] + 1;
    }
}

double total = counts['A'] + counts['T'] + counts['G'] + counts['C'];
double GC = (counts['G'] + counts['C'])*100/total;
double AT = (counts['A'] + counts['T'])*100/total;
double ratio = AT/GC;
auto classification = "";

if ( 40.0 < GC < 60.0) {
   classification = "moderate GC content";
}
if (60 <= GC) {
   classification = "high GC content";
}
if (GC <= 40.0) {
   classification = "low GC content";
}


std::cout << "GC-content: " << GC << "\n";
std::cout << "AT-content: " << AT << "\n";
std::cout << "G count: " << counts['G'] << "\n";
std::cout << "C count: " << counts['C'] << "\n";
std::cout << "A count: " << counts['A'] << "\n";
std::cout << "T count: " << counts['T'] << "\n";
std::cout << "Total count: " << total << "\n";
std::cout << "AT/GC Ratio: " << ratio << "\n";

std::cout << "GC Classification: " << classification << "\n";
}

给我带来麻烦的文件是这样的

@ERR034677.1 HWI-EAS349_0046:7:1:2144:972#0 length=76
NGATGATAAACAAGAGGGTAAAAAGAAAAAAGCTACAGACATTTCTGCTAATCTATTATTTTGTTCCTTTTTTTTT
+ERR034677.1 HWI-EAS349_0046:7:1:2144:972#0 length=76
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB

如果有人能帮我这件事。 我会非常感激的。 我只需要一个提示或一个概念的想法,我错过了,所以我可以使我的代码兼容所有文件。 提前致谢


共1个答案

匿名用户

您的实际问题似乎是“输入不总是干净的语法”的标准情况。
解决方案总是“不要期望干净的语法”。
首先将整行读入缓冲区。
然后检查语法。
跳过损坏的语法。
从缓冲区扫描干净的语法。

相关问题


MySQL Query : SELECT * FROM v9_ask_question WHERE 1=1 AND question regexp '(何在|c++|中|忽略|输|入行)' ORDER BY qid DESC LIMIT 20
MySQL Error : Got error 'repetition-operator operand invalid' from regexp
MySQL Errno : 1139
Message : Got error 'repetition-operator operand invalid' from regexp
Need Help?