Cassandra更新不能一致工作 [英] Cassandra updates not working consistently

查看:158
本文介绍了Cassandra更新不能一致工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在本地(mac)机器和远程unix服务器上运行以下代码:

I run the following code on my local (mac) machine and on a remote unix server.:

public void deleteValue(final String id, final String value) {
    log.info("Removing value " + value);
    final Collection<String> valuesBeforeRemoval = getValues(id);
    final MutationBatch m = keyspace.prepareMutationBatch();
    m.withRow(VALUES_CF, id).deleteColumn(value);
    try {
      m.execute();
    } catch (final ConnectionException e) {
      log.error("Unable to delete  location " + value, e);
    }
    final Collection<String> valuesAfterRemoval = getValues(id);
    if (valuesAfterRemoval.size()!=(valuesBeforeRemoval.size()-1)) {
      log.error("value " + value + " was supposed to be removed from list "  + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
    }
...
  }

protected Collection<String> getValues(final String id) {
  try {
    final OperationResult<ColumnList<String>> operationResult = keyspace
            .prepareQuery(VALUES_CF).getKey(id).execute();
    final ColumnList<String> result = operationResult.getResult();
    if (result.isEmpty()) {
      log.info("No  value found for id: " + id);
      return new ArrayList<String>();
    }
    return result.getColumnNames();
  } catch (final ConnectionException e) {
    log.error("Unable to retrieve session " + id, e);
  }
  return new ArrayList<String>();
}

在本地,该行从不执行,这是有道理的:

Locally, that line is never executed, which makes sense:

log.error("value " + value + " was supposed to be removed from list "  + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);

但该行在我的开发服务器上执行:

but that line is executed on my dev server:

[ERROR] [main] [nowsdSessionDaoCassandraImpl] [2013-03-08 13:12:24,801]
[] - 值3应该从列表中删除[3,2,1,0 ,7,6,5,4,9,8]但不是:[3,2,1,0,7,6,5,4,9,8]

[ERROR] [main] [n.o.w.s.d.SessionDaoCassandraImpl] [2013-03-08 13:12:24,801] [] - value 3 was supposed to be removed from list [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] but it wasn't: [3, 2, 1, 0, 7, 6, 5, 4, 9, 8]


  • 我使用com.netflix.astyanax

  • 我的本地计算机和远程开发服务器都连接到非常
    同一cassandra

  • 我的本地计算机和远程开发服务器都运行相同的测试
    创建一个新的行系列,并在删除之前添加10条记录。
  • 当dev发生错误时,log.error(无法删除
    location+ value,e);没有执行(即运行删除
    命令没有产生任何异常)。

  • 我100%正面,没有其他代码影响$ b $的内容b数据库,而我在dev上运行测试,所以这不是一些
    奇怪的并发问题。

  • I am using com.netflix.astyanax
  • Both my local machine and the remote dev server connect to the very same cassandra instance.
  • Both my local machine and the remote dev server run the very same test creating a new row family, and adding 10 records before one is deleted.
  • When the error occurs on dev, log.error("Unable to delete location " + value, e); was not executed (i.e. running the deletion command didn't produce any exception).
  • I am 100% positive that no other code is affecting the content of the database while I am running the test on dev so this isn't some strange concurrency issue.

解释deleteColumn(value)请求运行时没有产生任何错误,但是仍然不从数据库中删除列?

What could possibly explain that the deleteColumn(value) request runs without producing any error but still does not remove the column from the database?

ADDITIONAL INFO

ADDITIONAL INFO

这里是我创建的键空间:

Here is how I created the keyspace:

create keyspace sessiondata
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = {replication_factor:1};

这是我如何创建列族值,在上面的代码中引用为VALUES_CF:

Here is how I created the column family values, referenced as VALUES_CF in the code above:

create column family values
    with comparator = UTF8Type
;

下面是在上面的java代码中引用的键空间的定义:

Here is how the keyspace referenced in the java code above is defined:

final AstyanaxContext.Builder contextBuilder = getBuilder();
final AstyanaxContext<Keyspace> keyspaceContext = contextBuilder
        .forKeyspace(keyspaceName).buildKeyspace(
                ThriftFamilyFactory.getInstance());
keyspaceContext.start();
keyspace = keyspaceContext.getEntity();

其中getBuilder是:

where getBuilder is:

  private Builder getBuilder() {
    final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
    .setDiscoveryType(NodeDiscoveryType.NONE)
    .setRetryPolicy(new RunOnce());

    final ConnectionPoolConfigurationImpl poolConf = new ConnectionPoolConfigurationImpl("MyPool")
    .setPort(port)
    .setMaxConnsPerHost(1)
    .setSeeds(value);

    return new AstyanaxContext.Builder()
    .forCluster(cluster)
    .withAstyanaxConfiguration(conf)
    .withConnectionPoolConfiguration(poolConf)
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor());
  }

第二次更新


  • 首先,问题不仅与删除有关。

  • First, the issues are not solely related to deletes. I observe similar problems when updating records in the database, reading them, and not being able to read the updates I just wrote

其次,我创建了一个测试,该测试用于在数据库中更新记录,读取它们,执行100次以下操作:

Second, I created a test that does 100 times the following operations:


  • 向cassandra中写入一行

  • 更新cassandra

  • 从cassandra读取该行,并检查该行是否确实更新,如果不是延迟,则再次定期检查

我观察到的测试是:


  • 当我在本地运行该代码时,所有100次迭代都立即传递(不需要重试)

  • 当我在远程服务器上运行该代码时,一些迭代通过,一些失败。当它们失败时,无论延迟多长(我等待最多10秒),测试总是失败。

在这一点上,我真的不知道如何任何cassandra设置可以解释这种行为,因为我连接到相同的服务器,我的测试,因为我插入的延迟比任何额外的延迟我可能需要在从本地计算机连接时运行测试。

At this point, I am really not sure how any cassandra setup could explain this behavior since I connect to the very same server for my tests and since the delays I insert are much larger than any additional latency I may need to run the test when connecting from my local machine.

唯一的相关区别似乎是运行代码的计算机。

The only relevant difference seems to be which machine the code is running on.

第三次更新

如果在上一次更新中提到的测试中, ,如果延迟> = 1,000 ms,代码开始通过。延迟,例如100 ms没有帮助。我还修改了构建器以将默认读取和写入一致性设置为最苛刻的:ALL,并且对测试结果没有影响(除非写入之间的延迟大于1s,否则仍然失败大约一半时间):

If in the test mentioned in the previous update, I insert a delay between the 2 writes, the code starts passing if the delay is >= 1,000 ms. A delay of, say, 100 ms doesn't help. I also modified the builder to set the default read and write consistencies to the most demanding: ALL, and that had no impact on the results of the test (still failing about half of the time unless delay between writes >1s):

final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce()).setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ALL).setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ALL);


推荐答案

要进行调试,请尝试打印整行列名称。当我说完整行我的意思是列名,列值和时间戳。一个长镜头是你的测试机器上的时钟是错误的,这是抛弃你的测试在另一个。

To debug, try printing the full row instead of just the column names. When I say the full row I mean the column name, column value and the time stamp. A long shot is clocks are wrong on one of your test machines and this is throwing out your tests on the other.

另一件事要仔细检查是ip确实是什么你认为它是,在你的应用程序和cassandra。当你检索它打印它之间的东西,如println( - + ip - )。之前和之后,你的try块的deleteSecureLocation中的执行只做那个列,而不是整个行。我不太确定如何做在astynax,在cli它会得到[id] [ip]。

Another thing to double check is that ip is indeed what you think it is, in both your application and cassandra. When you retrieve it print it between something, like println("-" + ip "-"). Before and after your try block for the execute in deleteSecureLocation do a get for only that column, not the entire row. I'm not too sure how to do that in astynax, on the cli it would be get[id][ip].

要记住的一点是,删除即使没有任何要删除也不会失败。对cassandra它是一个写,唯一的事情,将使它删除是如果读取它是最新的时间戳条目对该行/列名。

Something to keep in mind is that a delete won't fail even if there's nothing to delete. To cassandra it's a write, the only thing that will make it a delete is if on read it's the latest timestamped entry against that row/column name.

这篇关于Cassandra更新不能一致工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆